Architecture Center
Deploy and Run Azkaban on Alibaba Cloud

Deploy and Run Azkaban on Alibaba Cloud

Overview
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop and database jobs. Azkaban resolves the ordering through job dependencies and provides an easy-to-use web interface to track and maintain your workflows.
Azkaban supports ApsaraDB RDS for MySQL and PolarDB as the back end database. This solution deploys Azkaban in multiple executor mode with the high availability version of ApsaraDB RDS MySQL, using only one ECS to host both the Azkaban web server and an Azkaban executor server, and runs a demo workflow task.
Reference Architecture
Steps
Deploy Resources
Use this main.tf file in Terraform to provision ECS, EIP, and RDS MySQL instances from this solution.
The ECS, EIP, and RDS PostgreSQL instances information will be listed after the script execution is completed.

eip_ecs: The public EIP of the ECS for Azkaban installation host
rds_mysql_url: The connection URL of the backend RDS MySQL database for Azkaban
rds_pg_url_azkaban_demo_database: The connection URL of the demo RDS PostgreSQL database using Azkaban
rds_pg_port_azkaban_demo_database: The connection Port of the demo RDS PostgreSQL database using Azkaban, by default, it is 1921 for RDS PostgreSQL
Set up Azkaban on ECS with RDS MySQL
1. Log on to ECS via SSH with the default password N1cetest.
ssh root@ECP_EIP
2. Run the following command to install GCC, JDK 8, Git, MySQL client, Python 3, Python module psycopg2 and the PostgreSQL client on the ECS:
yum install -y gcc-c++*
yum install -y java-1.8.0-openjdk-devel.x86_64
yum install -y git
yum install -y mysql.x86_64

yum install -y python39
yum install -y postgresql-devel
pip3 install psycopg2

cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
. tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
3. Run the following command to download and build the Azkaban project:
cd ~
git clone https://github.com/azkaban/azkaban.git
cd ~/azkaban
./gradlew clean
./gradlew build installDist -x test
4. Run the following command to build module: azkaban-db.
cd ~/azkaban/azkaban-db; ../gradlew build installDist -x test
5. Run the following command to create all the tables needed for Azkaban on RDS MySQL. Please replace rds_mysql_url with the provisioned RDS MySQL connection string:
cd ~/azkaban/azkaban-db/build/distributions
unzip azkaban-db-*.zip
mysql -h rds_mysql_url -P3306 -uazkaban -pN1cetest azkaban < ~/azkaban/azkaban-db/build/distributions/azkaban-db-*/create-all-sql-*.sql
6. Connect to the RDS MySQL again, and run show tables to view the created tables for Azkaban:
mysql -hrm-3nssusij8bbe3a9c3.mysql.rds.aliyuncs.com -P3306 -uazkaban -pN1cetest azkaban
7. Run the following command to build module azkaban-exec-server, which is the Azkaban Executor Server:
cd ~/azkaban/azkaban-exec-server; ../gradlew build installDist -x test
8. Edit the azkaban.properties file to modify the properties of executor server accordingly:
vim ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server/conf/azkaban.properties

Please refer to https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html for the property default.timezone.id. For example, if you are located in China, set the timezone to Asia/Shanghai.
9. Run the following command to build module azkaban-web-server, which is the Azkaban Web Server:
cd ~/azkaban/azkaban-web-server; ../gradlew build installDist -x test
10. Edit the azkaban.properties file to modify the properties of web server accordingly:
vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban.properties

Note: The azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus must be replaced with azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus to remove the parameter MinimumFreeMemory. The web server will check whether the free memory of the executor host will be greater than 6G. If it is less than 6G, the web server will not hand over the task to the executor host for execution. This solution uses entry-level ECS with small memory (less than 6G), so you need to remove this parameter to make the task work.
11. Configure the Azkaban web server user account. Use the username azkaban and password azkaban to log on to the Azkaban web console:
vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban-users.xml
12. Run the following command to start the Azkaban executor server:
cd ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server ./bin/start-exec.sh
curl -G "localhost:$(<./executor.port)/executor?action=activate" && echo
13. Run the following command to start the Azkaban web server:
cd ~/azkaban/azkaban-web-server/build/install/azkaban-web-server ./bin/start-web.sh
14. A multi-executor Azkaban instance is ready. Visit http://ECS_EIP:8081 and log in to Azkaban web console with username azkaban and password azkaban.
Download and Prepare Demo Workflow Project Package
1. Get the demo project package.
git clone https://github.com/alibabacloud-howto/opensource_with_apsaradb.git
cd opensource_with_apsaradb/azkaban/project-demo
ls -l


_1_prepare_source_db.py: A Python script to prepare tables and data in the source demo database
northwind_source on RDS PostgreSQL
_2_prepare_target_db.py: A Python script to prepare tables and data in the target demo database northwind_target on RDS PostgreSQL
_3_data_migration.py: A Python script to migrate data of products and orders in two tables from the source database
northwind_source to target database northwind_target
job1_prepare_source_db.job: Azkaban job to call _1_prepare_source_db.py
job2_prepare_target_db.job: Azkaban job to call _2_prepare_target_db.py
job3_data_migration.job: Azkaban job to call _3_data_migration.py, which needs job1_prepare_source_db.job and job2_prepare_target_db.job to be executed before hand
northwind_data_source.sql: DML to insert data to the source demo database northwind_source
northwind_data_target.sql: DML to insert data to the target demo database northwind_target
northwind_ddl.sql: DDL to create tables on both the source demo database northwind_source and the target demo database northwind_target
2. Edit the Azkaban project files accordingly to connect to the target RDS PostgreSQL demo database. Then run the following command to package all the project files to a zip package:
zip -q -r project_demo_northwind.zip *
Deploy and Run the Demo Azkaban Workflow Task
1. Visit http://ECS_EIP:8081 and log in to the Azkaban web console with username azkaban and password azkaban.
2. Create an Azkaban project.
3. Upload the project zip file packaged beforehand.
4.Click the job entry to go to the workflow page, and then click Schedule / Execute Flow and Execute to run the task. After the task is completed, the job graph on the workflow page will turn green.

Reach Alibaba Cloud experts for support

Contact Us