Zum Inhalt springen
_CORE
KI & Agentensysteme Unternehmensinformationssysteme Cloud & Platform Engineering Datenplattform & Integration Sicherheit & Compliance QA, Testing & Observability IoT, Automatisierung & Robotik Mobile & Digitale Produkte Banken & Finanzen Versicherungen Öffentliche Verwaltung Verteidigung & Sicherheit Gesundheitswesen Energie & Versorgung Telko & Medien Industrie & Fertigung Logistik & E-Commerce Retail & Treueprogramme
Referenzen Technologien Blog Know-how Tools
Über uns Zusammenarbeit Karriere
CS EN DE
Lassen Sie uns sprechen

Apache Airflow - Orchestrierung von Datenpipelines in der Praxis

10. 07. 2025 1 Min. Lesezeit intermediate

Apache Airflow is the most widespread data pipeline orchestrator. Define workflows as Python code, schedule execution and monitor progress.

Was ist Apache Airflow

Airflow defines workflows as DAG (Directed Acyclic Graph) — a graph of tasks with operators and dependencies.

Kernkonzepte

  • DAG — workflow as Python code
  • Operator — individual task (Bash, Python, SQL)
  • Scheduler — scheduler based on cron expressions
  • Executor — Local, Celery, Kubernetes
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

with DAG(
    dag_id='daily_sales',
    schedule_interval='0 6 * * *',
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_fn)
    transform = PythonOperator(task_id='transform', python_callable=transform_fn)
    load = PythonOperator(task_id='load', python_callable=load_fn)
    extract >> transform >> load

TaskFlow API (Airflow 2.x)

from airflow.decorators import dag, task

@dag(schedule_interval='@daily', start_date=datetime(2026, 1, 1))
def sales_pipeline():
    @task()
    def extract(): return fetch_data()
    @task()
    def transform(data): return clean(data)
    @task()
    def load(data): save(data)
    load(transform(extract()))

sales_pipeline()

Bewaehrte Praktiken

  • Idempotence — safe repeated execution
  • Atomicity — task succeeds or fails completely
  • XCom only for metadata — not for large datasets

Zusammenfassung

Airflow is the standard for orchestration. TaskFlow API simplifies code, the key is idempotence and proper credential management.

apache airfloworchestrationdagpipeline
Teilen:

CORE SYSTEMS Team

Wir bauen Kernsysteme und KI-Agenten, die den Betrieb am Laufen halten. 15 Jahre Erfahrung mit Enterprise-IT.