Airflow Component

Workflow orchestration platform for data pipelines. Auto-generates connection configurations and ETL DAGs based on linked data sources.

Architecture

Scheduler - DAG scheduling and task distribution
Webserver - UI and API (port 8080)
Workers - Task execution (Celery/K8s)
DAGs - Auto-generated ETL pipelines

Quick Reference

Attribute Example Default Effect
namespace REQ airflow - Kubernetes namespace
executor KubernetesExecutor Celery Task execution mode
git_repo github.com/org/dags - Git sync for DAGs
git_branch main main DAGs branch

Link Variables (17 Types)

Variable Link Type Purpose
__prometheus prometheus-airflow Metrics scraping
__ingress apisix-airflow Web UI access
__rel_db airflow-reldb PostgreSQL metadata DB
__cnpg airflow-cnpg (db) CNPG connection + DAG
__mongodb airflow-mongodb (mongo_db) MongoDB connection + DAG
__rabbitmq airflow-rabbitmq Celery broker backend
__exchange airflow-rabbitmq (exchange) RabbitMQ exchange + DAG
__queue airflow-rabbitmq (queue) RabbitMQ queue + DAG
__kafka airflow-kafka Kafka connection
__topic airflow-kafka (topic) Kafka topic + DAG
__bucket airflow-minio (bucket) S3 bucket + DAG
__spark airflow-spark Spark connection
__apisix airflow-apisix Gateway connection
__sonarqube airflow-sonarqube Code quality DAGs

Auto-Generated ETL DAGs

DAG File Triggered By Operations
etl_cnpg.py __cnpg link PostgreSQL ETL pipeline
etl_mongo.py __mongodb link MongoDB ETL pipeline
etl_bucket.py __bucket link S3/MinIO data pipeline
etl_rabbitmq.py __exchange/__queue link Message queue pipeline
test_all_connections.py Always Connection validation DAG

Generated Files

File Condition Contains
helm/helm-values.yaml Always Airflow Helm configuration
config/cloud.env Always AIRFLOW_CONN_* variables
secret/credentials.env Always Database/service credentials
dags/*.py Per data link ETL DAG definitions
scripts/create-connections.py Always Connection setup script
scripts/install-providers.sh Always Provider package installation

Connection ENV Format

PostgreSQL: AIRFLOW_CONN_POSTGRES_DEFAULT="postgresql://USER:PASS@HOST:5432/DB"
MongoDB: AIRFLOW_CONN_MONGO_DEFAULT="mongodb://USER:PASS@HOST:27017/DB?authSource=admin"
S3/MinIO: AIRFLOW_CONN_AWS_DEFAULT="aws://ACCESS:SECRET@?region_name=minio"
Kafka: AIRFLOW_CONN_KAFKA_DEFAULT='{"bootstrap.servers":"broker:9092"}'
RabbitMQ: AIRFLOW_CONN_RABBITMQ_DEFAULT="amqp://USER:PASS@HOST:5672/VHOST"

Ports

Port Purpose Protocol
8080 Web UI HTTP
8793 Worker logs HTTP
5555 Flower (Celery monitor) HTTP

Technical Info

Chart Version: 22.4.5
Ports: 8080 (webserver), 8793 (logs), 5555 (flower)
Executors: Celery, KubernetesExecutor, LocalExecutor