Hadoop Ecosystem — HDFS, YARN and Modern Alternatives¶
Hadoop launched the big data era. MapReduce has been replaced by Spark, HDFS is being replaced by cloud storage, but the principles endure.
Hadoop — From Revolution to Evolution¶
HDFS¶
- Block storage — 128 MB blocks
- Replication — 3 copies
- Data locality — compute near the data
From Hadoop to the Cloud¶
- HDFS -> S3/GCS — elastic storage
- MapReduce -> Spark — 100x faster
- YARN -> Kubernetes
- Hive -> Trino — interactive SQL
CREATE EXTERNAL TABLE orders (
order_id STRING,
total_czk DECIMAL(12,2)
) STORED AS PARQUET
LOCATION 'hdfs:///data/orders/';
SELECT YEAR(order_date) AS year,
SUM(total_czk) AS revenue
FROM orders GROUP BY YEAR(order_date);
Summary¶
Hadoop laid the foundations of big data. Modern architecture replaces its components with cloud services.
hadoophdfsyarnbig data