In an earlier post on Airflow, I shared my experience moving from the MesosExecutor to the DaskExecutor. This post focuses on deploying the Airflow KubernetesExecutor to dynamically launch worker pods in Kubernetes.
Prior to the release of Airflow 1.10.13, documentation to standing up the Airflow KubernetesExecutor was fairly difficult to track down in GitHub. The current architecture diagrams on the Airflow Docs help make the integration architecture easier to follow.
Airflow is a python-based workflow management tool suited for scheduled jobs and data pipeline orchestration. Airflow’s development lifecycle is rapid and as a result, the core technologies that Airflow integrates with have experienced change over time. At the heart of Airflow is a concept called the executor; the choice of executor impacts Airflow’s ability to distribute and scale its workloads. Airflow offers a mix of local executors and distributed executors, but today’s focus is on one of Airflow’s relatively new, distributed executors: the Dask Executor.
Dask is a software project that aims to natively scale python. Dask architecture has two…