Apache Airflow is a platform created by the community to author, schedule and monitor workflows programmatically.
Airflow is deployed with a database. You can use the default SQLite or single node PostgreSQL built in the Docker edition, or deploy Airflow with Alibaba Cloud databases to enhance high availability. Airflow supports ApsaraDB RDS for PostgreSQL, ApsaraDB RDS for MySQL, and PolarDB.
This solution deploys Airflow with the high availability edition of ApsaraDB RDS for PostgreSQL to replace the default single node PostgreSQL, and runs a data migration task in Airflow.
Use this main.tf file in Terraform to provision resources from this solution.
This solution uses ApsaraDB RDS for PostgreSQL as the backend database of Airflow and another RDS PostgreSQL as the demo database showing the data migration via Airflow task. It includes an ECS and 2 RDS PostgreSQL instances in the Terraform script.
After the Terraform script execution is completed, the ECS and RDS PostgreSQL instances information will be listed.
rds_pg_url_airflow_database: The connection URL of the backend database for Airflow
rds_pg_url_airflow_demo_database: The connection URL of the demo database using Airflow Note: The default database port for RDS PostgreSQL is 1921.
Set Up Airflow on ECS with RDS PostgreSQL
1. Log on to ECS via SSH with ECS EIP with the default password
2. Download and run the installation script to setup Airflow on ECS:
cd ~ wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/setup.sh sh setup.sh cd ~/airflow mkdir ./dags ./logs ./plugins echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
3. Edit the downloaded
docker-compose.yamlfile to set RDS PostgreSQL as the backend database. Use the connection string of
rds_pg_url_airflow_databasein Deploy Resources:
cd ~/airflow vim docker-compose.yaml
4. Initialize the Airflow docker:
docker-compose up airflow-init
5. Start Airflow:
6. Visit the following URL with the EIP of ECS to access the Airflow web console:
http://ECS_EIP:8080Note: The default username and password are both
Prepare the Source and Target Databases for Airflow Data Migration
1. Log on to ECS via SSH:
2. Download and set up the PostgreSQL client:
cd ~ wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
3. Get the database DDL and DML SQL files:
northwind_ddl.sqlis for both source and target databases.
northwind_data_source.sqlis for the source database.
northwind_data_target.sqlis for the target database.
cd ~/airflow wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_ddl.sql wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_source.sql wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_target.sql
4. Connect to the source database
northwind_source, create the tables
northwind_ddl.sqland load the sample data
rds_pg_url_airflow_demo_databasewith the RDS PostgreSQL connection string to set up the demo database account with username
Run the following command for the source database:
cd ~/adbpg_client_package/bin ./psql -h
-p1921 -Udemo northwind_source \i ~/airflow/northwind_ddl.sql \i ~/airflow/northwind_data_source.sql select tablename from pg_tables where schemaname='public'; select count(*) from products; select count(*) from orders;
Run the following command for the target database:
-p1921 -Udemo northwind_target \i ~/airflow/northwind_ddl.sql \i ~/airflow/northwind_data_target.sql select tablename from pg_tables where schemaname='public'; select count(*) from products; select count(*) from orders;
5. You can see the tables, products, and orders in the target database are empty. We will use the migration task running in Airflow to migrate data from the source database to the target database.
Run the Data Migration Task in Airflow
1. Go to the Airflow web console to add database connections to the source and target databases respectively.
2. Download and deploy the python script for the migration task in Airflow in the
cd ~/airflow/dags wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_migration.py
3. The DAG task in this demo finds the new
northwind_sourceand then updates the same product and order tables in database
northwind_targetwith the rows greater than that maximum id. The job is scheduled to run every minute starting on today’s date (When you run this demo, please update accordingly). If the task runs successfully, the DAG task will be shown on the web console.
4. Go to the target database and check the migrated data.