Fundamentals 9 min read

How to Write Your First Apache Airflow DAG (Hello World)

This tutorial walks through creating a simple “Hello World” Apache Airflow DAG by setting up the Python file, importing modules, defining the DAG object, adding a PythonOperator task, writing the callable function, and running the DAG with Airflow’s webserver and scheduler.

DevOps Cloud Academy
DevOps Cloud Academy
DevOps Cloud Academy
How to Write Your First Apache Airflow DAG (Hello World)

In this article we’ll see how to write a basic “Hello World” DAG in Apache Airflow. We will go through all the files that we have to create in Apache Airflow to successfully write and execute our first DAG.

Create a Python file

Firstly, we will create a python file inside the airflow/dags directory. Since we are creating a basic Hello World script, we will keep the file name simple and name it HelloWorld_dag.py . Keep in mind if this is your first time writing a DAG in Airflow, then we will have to create the dags folder.

Importing important modules

To create a properly functional pipeline in Airflow, we need to import the DAG python module and the Operator python module in our code. We can also import the datetime module.

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

Create a DAG Object

In this step, we will create a DAG object that will nest the tasks in the pipeline. We send a “dag id”, which is the dag’s unique identifier. As a best practice, it is advised to keep the “dag_id” and the name of the python file as the same, so we will keep the “dag_id” as “HelloWorld_dag”.

Now we will define a “start_date” parameter, this is the point from where the scheduler will start filling in the dates. For the Apache Airflow scheduler, we also have to specify the interval in which it will execute the DAG. We define the interval in a “cron expression”. Apache Airflow has some pre‑defined cron expressions such as “@yearly”, “@hourly”, and “@daily”. For this example we will use “@hourly”. Once the scheduler starts filling in the dates on an hourly basis it will keep filling in the date until it reaches the current hour; this is called a “catchup”. We can turn off this “catchup” by setting its parameter value to False .

with DAG(
    dag_id="HelloWorld_dag",
    start_date=datetime(2021,1,1),
    schedule_interval="@hourly",
    catchup=False) as dag:

Create a Task

Now we will define a PythonOperator . A PythonOperator is used to invoke a Python function from within your DAG. We will create a function that will return “Hello World” when it is invoked. Like an object has a “dag_id”, similarly a task has a “task_id”. It also has a “python_callable” parameter, which takes as input the name of the function to be called.

task1 = PythonOperator(
        task_id="hello_world",
        python_callable=helloWorld)

Creating a callable function

Now we will create a callable function which will be called by the PythonOperator.

def helloWorld():
        print("Hello world!")

Setting Dependencies in DAG

We don’t need to indicate the flow because we only have one task here; we can just write the task name. If we had multiple tasks we could set their dependencies using the “>>” or “<<” operators.

Our complete DAG file should look like this

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def helloWorld():
        print("Hello world!")

with DAG(
    dag_id="HelloWorld_dag",
    start_date=datetime(2021,1,1),
    schedule_interval="@hourly",
    catchup=False) as dag:

    task1 = PythonOperator(
        task_id="hello_world",
        python_callable=helloWorld)

task1

To run our DAG file

To execute our DAG file, we need to start Apache Airflow and the Airflow scheduler. We can do that using the following commands:

airflow webserver -p 8081
airflow scheduler

# access: http://localhost:8081/

We will be able to see our DAG running in the Airflow Web UI once we log in to the terminal successfully.

Conclusion

In this blog, we saw how to write our first DAG and execute it. We saw how to instantiate a DAG object, create a task, and define a callable function.

Data EngineeringpythonDAGworkflowApache Airflow
DevOps Cloud Academy
Written by

DevOps Cloud Academy

Exploring industry DevOps practices and technical expertise.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.