Deploy and Run Apache Airflow on Alibaba Cloud-Alibaba Cloud Developer Community

2021-09-03 80

introduction: # Deploy and Run Apache Airflow on Alibaba Cloud Tutorial of running open source project Apache Airflow on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database). We also show a simple data migration ta
+ Follow to continue viewing

Tutorial of running open source project Apache Airflow on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database). We also show a simple data migration task deployed and run in Airflow to migrate data between 2 databases.

You can access the tutorial artifact including deployment script (Terraform), related source code, sample data and instruction guidance from the github project:https://github.com/alibabacloud-howto/opensource_with_apsaradb/tree/main/apache-airflow

More tutorial around Alibaba Cloud Database, please refer to:https://github.com/alibabacloud-howto/database

Overview

Apache Airflow (https://airflow.apache.org/) is a platform created by the community to programmatically author, schedule and monitor workflows.

Airflow requires a database. If you're just experimenting and learning Airflow, you can stick with the default SQLite option or single node PostgreSQL built in Docker edition. To enhance with the database high availability behind the Apache Airflow we will show the steps of deployment working with Alibaba Cloud Database.Airflow supports PostgreSQL and MySQL https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html. You can either use one of the following databases on Alibaba Cloud:

In this tutorial, we will show the case of using RDS PostgreSQL high availability edition to replace the single node PostgreSQL built in Docker edition for more stable production purpose.

Deployment architecture:

Index

  • Step 1. Use Terraform to provision ECS and databases on Alibaba Cloud
  • Step 2. Deploy and setup Airflow on ECS with RDS PostgreSQL
  • Step 3. Prepare the source and target database for Airflow data migration task demo
  • Step 4. Deploy and run data migration task in Airflow

Step 1. Use Terraform to provision ECS and database on Alibaba Cloud

Run the terraform script to initialize the resources (in this tutorial, we use RDS PostgreSQL as backend database of Airflow and another RDS PostgreSQL as the demo database showing the data migration via Airflow task, so ECS and 2 RDS PostgreSQL instances are included in the Terraform script). Please specify the necessary information and region to deploy.

After the Terraform script execution finished, the ECS and RDS PostgreSQL instances information are listed as below.

  • rds_pg_url_airflow_database: The connection URL of the backend database for Airflow
  • rds_pg_url_airflow_demo_database: The connection URL of the demo database using Airflow

The database port for RDS PostgreSQL is 1921by default.

Step 2. Deploy and setup Airflow on ECS with RDS PostgreSQL

Please log on to ECS ECS EIP.

ssh root@<ECS_EIP>

Download and execute the script setup.sh via the following commands to setup Airflow on ECS.

cd ~
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/setup.sh
sh setup.sh
cd ~/airflow
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

Edit the downloaded docker-compose.yamlfile to set the backend database as the RDS PostgreSQL.

cd ~/airflow
vim docker-compose.yaml

Use the connection string rds_pg_url_airflow_databasein Step 1. Comment the part related postgres.

Then execute the following command to initialize Airflow docker.

docker-compose up airflow-init

Then execute the following command to start Airflow.

docker-compose up

Now, Airflow has started successfully. Please visit the following URL (replace <ECS_EIP>with the EIP of the ECS) to access the Airflow web console.

http://<ECS_EIP>:8080

The default account has the login airflowand the password airflow.

Next, let's move on to work on the 1st data migration task on Airflow.

Step 3. Prepare the source and target database for Airflow data migration task demo

Please log on to ECS ECS EIPwithin another terminal window.

ssh root@<ECS_EIP>

Download and setup PostgreSQL client to communicate with the demo database.

cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz

Fetch the database DDL and DML SQL files.

  • northwind_ddl. SQL for both source and target database
  • northwind_data_source. SQL for the source database
  • northwind_data_target. SQL for the target database
cd ~/airflow
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_ddl.sql
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_source.sql
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_data_target.sql

There are 2 databases (source: northwind_source, target: northwind_target) working as the source and target respectively in this data migration demo.

Connect to the demo source databasenorthwind_source, create the tables ( northwind_ddl.sql) and load the sample data ( northwind_data_source.sql).Replace <rds_pg_url_airflow_demo_database>with the demo RDS PostgreSQL connection string.We've set up the demo database account as username demoand password N1cetest.

Execute the following commands for the source database:

cd ~/adbpg_client_package/bin
./psql -h<rds_pg_url_airflow_demo_database> -p1921 -Udemo northwind_source

\i ~/airflow/northwind_ddl.sql
\i ~/airflow/northwind_data_source.sql

select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;

Execute the following commands for the target database:

./psql -h<rds_pg_url_airflow_demo_database> -p1921 -Udemo northwind_target

\i ~/airflow/northwind_ddl.sql
\i ~/airflow/northwind_data_target.sql

select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;

We can see that tables productsand ordersin the target database are empty. Later we will use the migration task running in Airflow to migrate data from the source database to the target database.

Step 4. Deploy and run data migration task in Airflow

First, go to the Airflow web console ( Admin-&gt; Connections) to add database connections to the source and target databases respectively.

Download and deploy (put into dagsdirectory) the migration task python script https://github.com/alibabacloud-howto/opensource_with_apsaradb/blob/main/apache-airflow/northwind_migration.py into Airflow.

cd ~/airflow/dags
wget https://raw.githubusercontent.com/alibabacloud-howto/opensource_with_apsaradb/main/apache-airflow/northwind_migration.py

The DAG task in this demo finds the new product_idand order_id's in database northwind_sourceand then updates the same product and order tables in database northwind_targetwith the rows greater than that maximum id. The job is scheduled to run every minute starting on today's date (when you run this demo, please update accordingly).The demo airflow DAG python script is originated from https://dzone.com/articles/part-2-airflow-dags-for-migrating-postgresql-data, We've done some modification.

If the task loaded successfully, the DAG task is shown on the web console.

Since the migration task is running all the times, we can go to the target database and check the data migrated.

Elastic Computing relational database MySQL Apache data Center PostgreSQL RDS Shell distributed Database PolarDB
developer Community&gt; database &gt; article
Please read this disclaimer carefully before you start to use the service. By using the service, you acknowledge that you have agreed to and accepted the content of this disclaimer in full. You may choose not to use the service if you do not agree to this disclaimer. This document is automatically generated based on public content on the Internet captured by Machine Learning Platform for AI. The copyright of the information in this document, such as web pages, images, and data, belongs to their respective author and publisher. Such automatically generated content does not reflect the views or opinions of Alibaba Cloud. It is your responsibility to determine the legality, accuracy, authenticity, practicality, and completeness of the content. We recommend that you consult a professional if you have any doubt in this regard. Alibaba Cloud accepts no responsibility for any consequences on account of your use of the content without verification. If you have feedback or you find that this document uses some content in which you have rights and interests, please contact us through this link: https://www.alibabacloud.com/campaign/contact-us-feedback. We will handle the matter according to relevant regulations.
Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now