Airflow: using the DockerOperator with the LocalExecutor in Docker Compose
A short guide to configure a minimal setup.
This guide will allow you to run the DockerOperator using the LocalExecutor with Apache Airflow deployed on Docker Compose. The guide is split into four consecutive steps:
- Preparing the docker-compose.yaml
- Adding a new services in the docker-compose.yaml
- Creating a DAG that connects to the Docker API using this proxy
- Testing the DockerOperator
To deploy Airflow on Docker with Docker Compose you should fetch the docker-compose.yaml:
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/docker-compose.yaml'
For more information about running Airflow in Docker refer to the following documentation.
1. Preparing the docker-compose.yaml
First you need to adapt the standard docker-compose.yaml file to work with the LocalExecutor and DockerOperator:
- line 50: replace
CeleryExecutor
withLocalExecutor
- line 58: add the
apache-airflow-providers-docker==2.1.0
package to_PIP_ADDITIONAL_REQUIREMENTS
- comment out or delete lines: 53, 65-66, 85-94, 118-128, 140-150
Note that the apache-airflow-providers version must be 2.1.0 or higher.
You can now initialize the environment with:
docker-compose up airflow-init
2. Adding a new services in the docker-compose.yaml
You have to add one additional service to the docker-compose.yaml to use the DockerOperator.
# Required because of DockerOperator. For secure access and handling permissions.
docker-socket-proxy:
image: tecnativa/docker-socket-proxy:0.1.1
environment:
CONTAINERS: 1
IMAGES: 1
AUTH: 1
POST: 1
privileged: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: always
This service allows to mount the Docker socket more securely. You can find more information about this Docker image here.
3. Creating a DAG that connects to the Docker API using this proxy
Below is an example DAG that uses the DockerOperator:
curl -LfO 'https://raw.githubusercontent.com/apache/airflow/main/airflow/providers/docker/example_dags/example_docker.py'
Place this file inside the dags/ folder.
You have to modify the task t3
that uses the DockerOperator for it to work with the
docker-socker-proxy:
- line 45: set the
api_version
to at least1.30
- line 46: replace
localhost
withdocker-socket-proxy
- line 47: change the command to
"echo TEST DOCKER SUCCESSFUL"
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.providers.docker.operators.docker import DockerOperator
from airflow.utils.dates import days_ago
dag = DAG(
'docker_sample',
default_args={
'owner': 'airflow',
'depends_on_past': False,
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
},
schedule_interval=timedelta(minutes=10),
start_date=days_ago(2),
)
t1 = BashOperator(task_id='print_date', bash_command='date', dag=dag)
t2 = BashOperator(task_id='sleep', bash_command='sleep 5', retries=3, dag=dag)
t3 = DockerOperator(
api_version='1.30',
docker_url='tcp://docker-socket-proxy:2375', # Set your docker URL
command='echo TEST DOCKER SUCCESSFUL',
image='centos:latest',
network_mode='bridge',
task_id='docker_op_tester',
dag=dag,
)
t4 = BashOperator(task_id='print_hello', bash_command='echo "hello world!!!"', dag=dag)
t1 >> t2
t1 >> t3
t3 >> t4
4. Testing DockerOperator
Now trigger the docker_sample
dag through the Airflow webserver UI.
In the Logs of the docker_op_tester
task run you will see:
{docker.py:307} INFO - TEST DOCKER SUCCESSFUL
This indicates that the DockerOperator was successfully executed.
You can find the relevant files in my GitHub repo.