Architecture Center
Deploy and Run Apache Airflow on Alibaba Cloud

Deploy and Run Apache Airflow on Alibaba Cloud

Overview
Apache Airflow is a platform created by the community to author, schedule and monitor workflows programmatically.
Airflow is deployed with a database. You can use the default SQLite or single node PostgreSQL built in the Docker edition, or deploy Airflow with Alibaba Cloud databases to enhance high availability. Airflow supports ApsaraDB RDS for PostgreSQL, ApsaraDB RDS for MySQL, and PolarDB.
This solution deploys Airflow with the high availability edition of ApsaraDB RDS for PostgreSQL to replace the default single node PostgreSQL, and runs a data migration task in Airflow.
Reference Architecture
Steps
Deploy Resources
Use this main.tf file in Terraform to provision resources from this solution.
This solution uses ApsaraDB RDS for PostgreSQL as the backend database of Airflow and another RDS PostgreSQL as the demo database showing the data migration via Airflow task. It includes an ECS and 2 RDS PostgreSQL instances in the Terraform script.
After the Terraform script execution is completed, the ECS and RDS PostgreSQL instances information will be listed.
rds_pg_url_airflow_database: The connection URL of the backend database for Airflow
rds_pg_url_airflow_demo_database: The connection URL of the demo database using Airflow
Note: The default database port for RDS PostgreSQL is 1921.
Set Up Airflow on ECS with RDS PostgreSQL
1. Log on to ECS via SSH with ECS EIP with the default password N1cetest:
ssh root@ECS_EIP
2. Download and run the installation script to setup Airflow on ECS:
cd ~
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/setup.sh
sh setup.sh
cd ~/airflow
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
3. Edit the downloaded docker-compose.yaml file to set RDS PostgreSQL as the backend database. Use the connection string of rds_pg_url_airflow_database in Deploy Resources:
cd ~/airflow
vim docker-compose.yaml
4. Initialize the Airflow docker:
docker-compose up airflow-init
5. Start Airflow:
docker-compose up
6. Visit the following URL with the EIP of ECS to access the Airflow web console:
http://ECS_EIP:8080
Note: The default username and password are both airflow.
Prepare the Source and Target Databases for Airflow Data Migration
1. Log on to ECS via SSH:
ssh root@ECS_EIP
2. Download and set up the PostgreSQL client:
cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
3. Get the database DDL and DML SQL files:
northwind_ddl.sql is for both source and target databases.
northwind_data_source.sql is for the source database.
northwind_data_target.sql is for the target database.


cd ~/airflow wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_ddl.sql
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_source.sql
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_target.sql
4. Connect to the source database northwind_source, create the tables northwind_ddl.sql and load the sample data northwind_data_source.sql. Replace rds_pg_url_airflow_demo_database with the RDS PostgreSQL connection string to set up the demo database account with username demo and password N1cetest.
Run the following command for the source database:
cd ~/adbpg_client_package/bin ./psql -h -p1921 -Udemo northwind_source

\i ~/airflow/northwind_ddl.sql
\i ~/airflow/northwind_data_source.sql

select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;
Run the following command for the target database:
./psql -h -p1921 -Udemo northwind_target

\i ~/airflow/northwind_ddl.sql
\i ~/airflow/northwind_data_target.sql

select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;
5. You can see the tables, products, and orders in the target database are empty. We will use the migration task running in Airflow to migrate data from the source database to the target database.
Run the Data Migration Task in Airflow
1. Go to the Airflow web console to add database connections to the source and target databases respectively.
2. Download and deploy the python script for the migration task in Airflow in the dags directory:
cd ~/airflow/dags
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_migration.py
3. The DAG task in this demo finds the new product_id and order_id in database northwind_source and then updates the same product and order tables in database northwind_target with the rows greater than that maximum id. The job is scheduled to run every minute starting on today’s date (When you run this demo, please update accordingly). If the task runs successfully, the DAG task will be shown on the web console.
4. Go to the target database and check the migrated data.

Reach Alibaba Cloud experts for support

Contact Us