×
Community Blog Deploy and Run Azkaban on Alibaba Cloud

Deploy and Run Azkaban on Alibaba Cloud

This article is a tutorial on how to run the open-source project Azkaban on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database).

This is a tutorial on how to run the open-source project Azkaban on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database). We also show a simple data preparation and migration task deployed and run in Azkaban to demo a data preparation and migration workflow between two databases.

You can access the tutorial artifact, including the deployment script (Terraform), related source code, sample data, and instruction guidance from the Github project.

Please refer to this link for more tutorials related to Alibaba Cloud Database.

Overview

Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop and database jobs. Azkaban resolves the ordering through job dependencies and provides an easy-to-use web user interface to maintain and track your workflows.

After version 3.0, Azkaban provides two modes: the stand-alone solo-server mode and distributed multiple-executor mode reference.

  • In solo server mode, the DB is embedded H2, and the web server and executor server run in the same process. This should be useful if you want to try things out. It can also be used for small scale use cases.
  • The multiple executor mode is for most serious production environments. Its DB should be backed by MySQL instances with a master-slave setup. Ideally, the web server and executor servers should run in different hosts so that upgrading and maintenance don't affect users. This multiple host setup brings in a robust and scalable aspect to Azkaban.

To enhance the database's high availability behind the Azkaban, we will show the steps of deployment working with Alibaba Cloud Database RDS MySQL for Azkaban multiple executor mode. (In this tutorial, we use only one ECS to host one Azkaban web server and one Azkaban executor server.)

Azkaban supports MySQL as the backend database. You can use one of the following databases on Alibaba Cloud:

In this tutorial, we will show the case of using RDS MySQL high availability edition for more stable production purposes.

Deployment Architecture:

1

Index

Step 1. Use Terraform to provision ECS and database on Alibaba Cloud

Run the Terraform script to initialize the resources. (In this tutorial, we use RDS MySQL as the backend database of Azkaban and RDS PostgreSQL as the demo database showing the data preparation and migration via Azkaban task. This way, ECS, RDS MySQL, and RDS PostgreSQL instances are included in the Terraform script.) Please specify the necessary information and region to deploy:

2

After the Terraform script execution is finished, the ECS, RDS MySQL, and RDS PostgreSQL instances information are listed below:

3

  • eip_ecs: The public EIP of the ECS for Azkaban installation host
  • rds_mysql_url: The connection URL of the backend RDS MySQL database for Azkaban
  • rds_pg_url_azkaban_demo_database: The connection URL of the demo RDS PostgreSQL database using Azkaban
  • rds_pg_port_azkaban_demo_database: The connection Port of the demo RDS PostgreSQL database using Azkaban; it is 1921 for RDS PostgreSQL by default

Step 2. Deploy and setup Azkaban on ECS with RDS MySQL

Please log on to ECS with ECS EIP:

ssh root@<ECS_EIP>

4

Execute the following commands to install gcc, JDK 8, Git, MySQL client, python3, python module psycopg2, and PostgreSQL client on the ECS:

yum install -y gcc-c++*
yum install -y java-1.8.0-openjdk-devel.x86_64
yum install -y git
yum install -y mysql.x86_64

yum install -y python39
yum install -y postgresql-devel
pip3 install psycopg2

cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz

Execute the following commands to download and build the Azkaban project:

cd ~
git clone https://github.com/azkaban/azkaban.git
cd ~/azkaban
./gradlew clean
./gradlew build installDist -x test

Execute the following commands to build module azkaban-db:

cd ~/azkaban/azkaban-db; ../gradlew build installDist -x test

Execute the following commands to create all the tables needed for Azkaban on RDS MySQL. Please replace <rds_mysql_url> with the provisioned RDS MySQL connection string:

cd ~/azkaban/azkaban-db/build/distributions
unzip azkaban-db-*.zip
mysql -h<rds_mysql_url> -P3306 -uazkaban -pN1cetest azkaban < ~/azkaban/azkaban-db/build/distributions/azkaban-db-*/create-all-sql-*.sql

Connect to RDS MySQL again and execute show tables to view the created tables for Azkaban:

mysql -hrm-3nssusij8bbe3a9c3.mysql.rds.aliyuncs.com -P3306 -uazkaban -pN1cetest azkaban

5

Execute the following commands to build module azkaban-exec-server, which is the Azkaban Executor Server:

cd ~/azkaban/azkaban-exec-server; ../gradlew build installDist -x test

Edit the azkaban.properties file to modify the properties of executor server accordingly:

vim ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server/conf/azkaban.properties

Please refer to this link for the property default.timezone.id. Since we are located in China, use Asia/Shanghai. Please modify according to your location:

6

Execute the following commands to build module azkaban-web-server, which is the Azkaban Web Server:

cd ~/azkaban/azkaban-web-server; ../gradlew build installDist -x test

Edit the azkaban.properties file to modify the properties of web server accordingly:

vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban.properties

Please Pay Attention:

azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus must be replaced with azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus to remove the parameter MinimumFreeMemory.

The web server will check whether the free memory of the executor host will be greater than 6G. If it is of less than 6G, the web server will not hand over the task to the executor host for execution. Since we use entry-level ECS with small memory less than 6G in our tutorial, we need to remove this parameter to make the task work.

7

Azkaban web server user account is configured within the following file. Later, we will use the username azkaban and password azkaban to log on to the Azkaban web console.

vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban-users.xml

Now, execute the following commands to start the Azkaban executor server:

cd ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server
./bin/start-exec.sh
curl -G "localhost:$(<./executor.port)/executor?action=activate" && echo

Execute the following commands to start the Azkaban web server:

cd ~/azkaban/azkaban-web-server/build/install/azkaban-web-server
./bin/start-web.sh

Now, a multi-executor Azkaban instance is ready for use. Open a web browser and check this address: http://<ECS_EIP>:8081/. We are all set to log in to the Azkaban web console with username azkaban and password azkaban.

Step 3. Download and prepare demo Azkaban workflow project package

Azkaban relies on job files in a package to deploy and run the workflow. I've prepared a demo project with scripts, SQL files, and job files on this project's Github page.

On the local computer, check out the project to local from Github. Please make sure you have the Git installed on your local computer.

git clone https://github.com/alibabacloud-howto/opensource_with_apsaradb.git
cd opensource_with_apsaradb/azkaban/project-demo
ls -l

We can see the demo Azkaban project files:

8

Edit the Azkaban project files accordingly to connect to the target RDS PostgreSQL demo database:

9
10
11

Execute the following command to package all the project files into a zip package:

zip -q -r project_demo_northwind.zip *

12

Step 4. Deploy and run the demo Azkaban workflow jobs

Open a web browser and check out this address: http://<ECS_EIP>:8081/. We are all set to log in to the Azkaban web console with username azkaban and password azkaban:

13

Create an Azkaban project:

14

Upload the project zip file packaged in Step 3:

15

Then, we can see the job flow:

16

Click the job entry to see the whole job graph of the workflow:

17

Then, click the Schedule / Execute Flow and click Execute:

18

When the workflow execution finishes, we can see the green-colored job graph:

19

Click the Job List tab. We can see the execution status of the three jobs from this demo workflow:

20

Now, let's connect to the demo RDS PostgreSQL source and target databases to verify the data.

Execute the following commands to connect to the source database northwind_source and check the data in the tables' products and orders. Please replace <rds_pg_url_azkaban_demo_database> with the RDS PostgreSQL connection string:

cd ~/adbpg_client_package/bin
./psql -h<rds_pg_url_azkaban_demo_database> -p1921 -Udemo northwind_source
select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;

21

Execute the following commands to connect to the target database northwind_target and check the data in tables' products and orders. Please replace <rds_pg_url_azkaban_demo_database> with the RDS PostgreSQL connection string:

cd ~/adbpg_client_package/bin
./psql -h<rds_pg_url_azkaban_demo_database> -p1921 -Udemo northwind_target
select tablename from pg_tables where schemaname='public';
select count(*) from products;
select count(*) from orders;

22

0 1 0
Share on

ApsaraDB

377 posts | 57 followers

You may also like

Comments

ApsaraDB

377 posts | 57 followers

Related Products