Airflow Component
Workflow orchestration platform for data pipelines. Auto-generates connection configurations and ETL DAGs based on linked data sources.
Architecture
Scheduler - DAG scheduling and task distribution
Webserver - UI and API (port 8080)
Workers - Task execution (Celery/K8s)
DAGs - Auto-generated ETL pipelines
Quick Reference
| Attribute |
Example |
Default |
Effect |
namespace REQ |
airflow |
- |
Kubernetes namespace |
executor |
KubernetesExecutor |
Celery |
Task execution mode |
git_repo |
github.com/org/dags |
- |
Git sync for DAGs |
git_branch |
main |
main |
DAGs branch |
Link Variables (17 Types)
| Variable |
Link Type |
Purpose |
__prometheus |
prometheus-airflow |
Metrics scraping |
__ingress |
apisix-airflow |
Web UI access |
__rel_db |
airflow-reldb |
PostgreSQL metadata DB |
__cnpg |
airflow-cnpg (db) |
CNPG connection + DAG |
__mongodb |
airflow-mongodb (mongo_db) |
MongoDB connection + DAG |
__rabbitmq |
airflow-rabbitmq |
Celery broker backend |
__exchange |
airflow-rabbitmq (exchange) |
RabbitMQ exchange + DAG |
__queue |
airflow-rabbitmq (queue) |
RabbitMQ queue + DAG |
__kafka |
airflow-kafka |
Kafka connection |
__topic |
airflow-kafka (topic) |
Kafka topic + DAG |
__bucket |
airflow-minio (bucket) |
S3 bucket + DAG |
__spark |
airflow-spark |
Spark connection |
__apisix |
airflow-apisix |
Gateway connection |
__sonarqube |
airflow-sonarqube |
Code quality DAGs |
Auto-Generated ETL DAGs
| DAG File |
Triggered By |
Operations |
| etl_cnpg.py |
__cnpg link |
PostgreSQL ETL pipeline |
| etl_mongo.py |
__mongodb link |
MongoDB ETL pipeline |
| etl_bucket.py |
__bucket link |
S3/MinIO data pipeline |
| etl_rabbitmq.py |
__exchange/__queue link |
Message queue pipeline |
| test_all_connections.py |
Always |
Connection validation DAG |
Generated Files
| File |
Condition |
Contains |
| helm/helm-values.yaml |
Always |
Airflow Helm configuration |
| config/cloud.env |
Always |
AIRFLOW_CONN_* variables |
| secret/credentials.env |
Always |
Database/service credentials |
| dags/*.py |
Per data link |
ETL DAG definitions |
| scripts/create-connections.py |
Always |
Connection setup script |
| scripts/install-providers.sh |
Always |
Provider package installation |
Connection ENV Format
PostgreSQL: AIRFLOW_CONN_POSTGRES_DEFAULT="postgresql://USER:PASS@HOST:5432/DB"
MongoDB: AIRFLOW_CONN_MONGO_DEFAULT="mongodb://USER:PASS@HOST:27017/DB?authSource=admin"
S3/MinIO: AIRFLOW_CONN_AWS_DEFAULT="aws://ACCESS:SECRET@?region_name=minio"
Kafka: AIRFLOW_CONN_KAFKA_DEFAULT='{"bootstrap.servers":"broker:9092"}'
RabbitMQ: AIRFLOW_CONN_RABBITMQ_DEFAULT="amqp://USER:PASS@HOST:5672/VHOST"
Ports
| Port |
Purpose |
Protocol |
| 8080 |
Web UI |
HTTP |
| 8793 |
Worker logs |
HTTP |
| 5555 |
Flower (Celery monitor) |
HTTP |
Technical Info
Chart Version: 22.4.5
Ports: 8080 (webserver), 8793 (logs), 5555 (flower)
Executors: Celery, KubernetesExecutor, LocalExecutor