Lakehouse Architecture — Merging Data Lake and Warehouse¶
Lakehouse unifies data lake and warehouse into a single layer. Open table formats, medallion architecture and unified data access.
From Warehouse and Lake to Lakehouse¶
Medallion Architecture¶
- Bronze — raw data, append-only
- Silver — cleansed, validated
- Gold — business aggregations
# Bronze: ingestion from Kafka
bronze.writeStream.format("delta")
.start("/lakehouse/bronze/orders")
# Silver: cleansing
silver = spark.read.format("delta")
.load("/lakehouse/bronze/orders")
.dropDuplicates(["order_id"])
silver.write.format("delta").save("/lakehouse/silver/orders")
# Gold: aggregation
gold = spark.read.format("delta")
.load("/lakehouse/silver/orders")
.groupBy("order_date").agg(sum("total_czk").alias("revenue"))
gold.write.format("delta").save("/lakehouse/gold/revenue")
Advantages¶
- Single storage — no duplication
- Open formats — no vendor lock-in
- Cost efficiency — inexpensive object storage
Summary¶
Lakehouse with the medallion pattern is the preferred approach. Bronze-Silver-Gold ensures progressive quality improvement.
lakehousearchitekturadata lakewarehouse