Integrating Airflow with Slack for Daily Reporting


Airflow + Slack

Slack is an increasingly popular chat app used in the workplace. Apache Airflow is an open source platform for orchestrating workflows. One of the biggest advantages to using Airflow is the versatility around its hooks and operators. Hooks are interfaces to external platforms, databases and also serve as the basic building blocks of Operators.

Setting up a Slack Workspace

Note: if you are already familiar with setting up apps on Slack and with the Slack webhook, skip to “Airflow + Docker”
  • webhook url
curl -X POST -H 'Content-type: application/json' --data '{"text":"Hi, this is an automated message!"}' https://hooks.slack.com/services/XXXX

Airflow + Docker

I’m going to show you how to set up Airflow with Docker to properly containerize your application. I am using part of the setup from puckel/docker-airflow.

Airflow

The entire Airflow platform can be broken into four parts: the scheduler, the executor, the metadata database and the webserver.
  • executor executes the instructions for each job
  • the database stores Airflow states (did this job succeed or fail? how long did it take to run?)
  • the webserver is the user interface that makes it easier to interface with Airflow; the webserver is a Flask app under the hood
This is what the webserver looks like (taken from Apache Airflow’s documentation)

DAG (Directed Acyclic Graph)

DAGs are a very important concept in Airflow. Each DAG is a collection of similar tasks organized in a way that reflects their dependencies and relationships. These graphs cannot have directed cycles, or in other words, mutually dependent jobs.
Arrows represent dependency relationships; run_after_loop only runs if runme_0, runme_1, runme_2 finishes successfully

Setting up Airflow

Create a repository for your Airflow server. I will name mine slack-airflow. Once again, my repository is hosted here. These are the components in the directory:
  • Dockerfile
  • Airflow sub-directory to manage configurations
  • docker-compose.yml
  • shell script for starting Airflow
  • a DAG file (we will get more to this later)
apache-airflow[crypto,celery,postgres,jdbc,ssh,statsd,slack]==1.10.10
ENV weather_api_key=
ENV slack_webhook_url=
postgresql+psycopg2://airflow:airflow@postgres:5432/airflow

Open Weather API

Photo by Gavin Allanwood on Unsplash
ENV weather_api_key=

Weather DAG

Here is the link to my weather DAG. I placed it in a folder called dags.

Step 1: load the dependencies

Step 2: specify default arguments

I added myself as the owner of this DAG. This is a required argument for every DAG.

Step 3: simple class to get the daily forecast

We send a GET request to Open Weather to get the weather details for Toronto. The payload is parsed to get the description which is saved as the forecast attribute.

Step 4: DAG Definition

We are using the SlackWebhookOperator in our DAG. Feel free to name the http_conn_id anything you want, but the same connection needs to be set on the Airflow server (this is covered in the section: Setting up your Slack connection on Airflow). The webhook token is fetched from the environment variables.

Starting the Docker Container

Run these two commands in your root directory.
docker build . -t airflow
docker-compose -f docker-compose.yml up -d

Setting up your Slack connection on Airflow

We are almost done.

Step 1: Go to Admin > Connections

Go to localhost:8080 to access the webserver and click on Admin > Connections.

Step 2: Create a new Connection

Hit Create and fill the fields accordingly. For the Slack connection, you will only need the Conn id and Host
  • host = your webhook URL

Test the DAG

If you don’t want to wait for the scheduled interval to observe the results, manually trigger a DAG run by hitting the Trigger DAG button.

Thank you for reading!

If you enjoyed this article, check out my other articles on Data Science, Math and Programming. Follow me on Medium for the latest updates. 😃

Comments

Popular posts from this blog

Flutter for Single-Page Scrollable Websites with Navigator 2.0

A Data Science Portfolio is More Valuable than a Resume

Better File Storage in Oracle Cloud