Big Data 14 min read

Comprehensive Guide to Installing and Using Apache Airflow with Docker on Windows

This article provides a detailed tutorial on Apache Airflow fundamentals, Docker-based installation on Windows, Dockerfile creation, container deployment via Docker run and Docker Compose, Airflow configuration, and practical usage of DAGs, tasks, connections, and UI features for data pipeline orchestration.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Comprehensive Guide to Installing and Using Apache Airflow with Docker on Windows

Apache Airflow, originally developed at Airbnb in 2014 and later graduated to an Apache Top-Level Project in 2019, is a Python‑based workflow orchestration platform that uses directed acyclic graphs (DAGs) to define and schedule data pipelines, offering features such as task dependencies, monitoring, and extensibility with many integrations (e.g., AWS S3, Docker, Hadoop, Hive, Kubernetes, MySQL, Postgres, Zeppelin).

The article explains key Airflow concepts: Data Pipeline, DAGs, Tasks (operators like BashOperator and PythonOperator), Connections, Pools, XComs, Trigger Rules, Backfill, the Airflow 2.0 API, and the AIRFLOW_HOME directory for DAG and plugin storage.

For Windows users, the guide recommends installing Docker Desktop (or WSL2) and outlines the steps to create a custom Dockerfile that extends the official apache/airflow:2.3.0 image, installs additional Linux tools, copies a requirements.txt file, installs Python dependencies, adds DAG scripts, and creates a writable directory.

# Use the official Airflow image
FROM apache/airflow:2.3.0
# Switch to root to install system packages
USER root
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
           vim \
    && apt-get autoremove -yqq --purge \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
# Switch back to airflow user for pip installs
USER airflow
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
# Copy DAG and data files with proper ownership
COPY --chown=airflow:root BY02_AirflowTutorial.py /opt/airflow/dags
COPY src/data.sqlite /opt/airflow/data.sqlite
# Create a writable directory
RUN umask 0002; \
    mkdir -p ~/writeable_directory

Two deployment methods are described:

Build the image with docker build -t airflow:latest . and run it using docker run -it --name test -p 8080:8080 --env "_AIRFLOW_DB_UPGRADE=true" --env "_AIRFLOW_WWW_USER_CREATE=true" --env "_AIRFLOW_WWW_USER_PASSWORD=admin" airflow:latest airflow standalone .

Use Docker Compose: define services in a docker-compose.yml (including Airflow, PostgreSQL, Redis, and workers), place an .env file with AIRFLOW_UID=50000 , then execute docker-compose up to launch all containers.

After deployment, the article shows how to initialize the metadata database ( airflow db init ), create an admin user, and start the webserver and scheduler either via airflow standalone or the explicit commands:

airflow db init
airflow users create \
    --username admin \
    --firstname Peter \
    --lastname Parker \
    --role Admin \
    --email [email protected]
airflow webserver --port 8080
airflow scheduler

Configuration details are covered, including editing airflow.cfg to set the SQLAlchemy connection string, choosing an executor, disabling example DAGs ( AIRFLOW__CORE__LOAD_EXAMPLES=False ), and customizing the UI.

The UI usage is demonstrated: enabling the left‑hand switch, starting DAG runs via the UI, CLI, or HTTP API, and inspecting task logs, clearing failed tasks, and visualizing the DAG graph and tree views.

Overall, the guide serves as a practical reference for data engineers who need to set up, configure, and operate Apache Airflow in a containerized Windows environment.

DockerPythonworkflow orchestrationdata pipelinesDocker ComposeApache Airflow
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.