Architecture Center
Build and Run ETL Data Pipeline and BI with Luigi and Metabase on Alibaba Cloud

Build and Run ETL Data Pipeline and BI with Luigi and Metabase on Alibaba Cloud

Overview
Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, failures, command line integration, and more.
Metabase is an open-source business intelligence tool that lets users ask questions about the data and displays answers in a bar graph or a detailed table. Metabase uses the application database (H2) by default and is compatible with ApsaraDB RDS for MySQL, ApsaraDB RDS for PostgreSQL, and PolarDB.
This solution deploys Metabase and Luigi on Alibaba Cloud and uses ApsaraDB RDS for PostgreSQL as the backend database of Metabase to enhance high availability. It sets up a demo database, runs a demo ETL data pipeline on Luigi, and generates the BI report on Metabase.
Reference Architecture
Steps
Deploy Resources
Use this main.tf file in Terraform to provision ECS, EIP, and RDS PostgreSQL instances from this solution.
The ECS, EIP, and RDS PostgreSQL instances information will be listed after the script execution is completed.

eip_ecs: The public EIP of the ECS for Metabase installation host
rds_pg_url_metabase_database: The connection URL of the backend RDS PostgreSQL database for Metabase
rds_pg_port_metabase_database: The connection port of the backend RDS PostgreSQL database for Metabase, by default, it is 1921 for RDS PostgreSQL
rds_pg_url_demo_database: The connection URL of the demo RDS PostgreSQL database using Luigi and Metabase
rds_pg_url_demo_database: The connection Port of the demo RDS PostgreSQL database using Luigi and Metabase, by default, it is 1921 for RDS PostgreSQL
Set Up Luigi and Metabase on ECS with RDS PostgreSQL
1. Log on to ECS via SSH with ECS EIP with the default password N1cetest:
ssh root@ECS_EIP
2. Run the following command to install GCC, Python, related python modules, Luigi, JDK 8, Git and PostgreSQL client:
yum install -y gcc-c++*
yum install -y python39
yum install -y postgresql-devel
pip3 install psycopg2
pip3 install pandas
pip3 install mlxtend
pip3 install pycountry
pip3 install luigi

yum install -y java-1.8.0-openjdk-devel.x86_64
yum install -y git

cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
3. Run the following command to copy the project files from Github and navigate to the project directory.
git clone https://github.com/alibabacloud-howto/opensource_with_apsaradb.git
cd opensource_with_apsaradb/luigi_metabase/
4. Run the following command to download the Metabase JAR file, which will be used for Metabase execution:
cd ~/opensource_with_apsaradb/luigi_metabase/metabase
wget https://downloads.metabase.com/v0.40.3.1/metabase.jar
5. Run the following command to migration Metabase backend database from H2 to RDS PostgreSQL that was provisioned before. Please update rds_pg_url_metabase_database with the corresponding connection string:
cd ~/opensource_with_apsaradb/luigi_metabase/metabase
export MB_DB_TYPE=postgres
export MB_DB_DBNAME=metabase
export MB_DB_PORT=1921
export MB_DB_USER=metabase
export MB_DB_PASS=N1cetest
export MB_DB_HOST=rds_pg_url_metabase_database
java -jar metabase.jar load-from-h2 ~/opensource_with_apsaradb/luigi_metabase/metabase/metabase.db
6. Run the following command to start Metabase using the RDS PostgreSQL as the backend database. Please update rds_pg_url_metabase_database with the corresponding connection string:
export MB_DB_TYPE=postgres
export MB_DB_DBNAME=metabase
export MB_DB_PORT=1921
export MB_DB_USER=metabase
export MB_DB_PASS=N1cetest
export MB_DB_HOST=rds_pg_url_metabase_database
java -jar metabase.jar
7. Visit http://ECS_EIP:3000 and log in with the Admin account in demo Metabase:
Admin User: admin@somebusiness.com
Password: N1cetest
Business Owner User: owner@somebusiness.com
Password: N1cetest
Set Up the Demo RDS PostgreSQL Database
1. Log on to ECS with ECS EIP in another CLI window (DO NOT close the CLI window you logged in before).
ssh root@ECS_EIP
2. Run the following command to create the schema sales_dw and tables in the demo RDS PostgreSQL database with the default password N1cetest. Please replace rds_pg_url_demo_database with the corresponding connection string of the demo RDS PostgreSQL instance.
cd ~/adbpg_client_package/bin
./psql -h -p1921 -Udemo sales_dw
3. In the PG client, execute the DDL SQL file and verify that 6 empty tables are created:
\i ~/opensource_with_apsaradb/luigi_metabase/sales_dw_ddl.sql
select tablename from pg_tables where schemaname='public';
There are 3 source tables and 3 target tables in the demo ETL data pipeline:
product_info: a source table in the demo ETL data pipeline
invoice: a source table in the demo ETL data pipeline
customer_info: a source table in the demo ETL data pipeline
invoice_time: a target table in the demo ETL data pipeline
invoice_outliers a target table in the demo ETL data pipeline
association_rules a target table in the demo ETL data pipeline
Run the Demo ETL Data Pipeline on Luigi
1. Log on to ECS with ECS EIP in another new CLI window (Do not close the CLI windows logged in before):
ssh root@ECS_EIP
2. In the CLI window, run the following command to start Luigi daemon:
luigid
3. Visit http://ECS_EIP:8082 to start using Luigi.
The full ETL data pipeline code are in the data pipeline file. It will load the raw data in the local ECS disk under https://github.com/alibabacloud-howto/opensource_with_apsaradb/tree/main/luigi_metabase/data, then process and transform the data to the local disk, and finally load the data into the RDS PostgreSQL database for Metabase BI reporting. The BI reports in Metabase has already been composed in this demo within the Metabase.
4. Switch to the CLI window created at Set Up the Demo RDS PostgreSQL Database. Edit the pipeline python code to change the demo database connection string URL to the value of rds_pg_url_demo_database.
cd ~/opensource_with_apsaradb/luigi_metabase
vim data_pipeline.py
5. Run the following command to kick off a pipeline execution for the data at 2018-03-30 in this CLI window, the data pipeline execution summary will show at the end:
cd ~/opensource_with_apsaradb/luigi_metabase
PYTHONPATH='.' luigi --module data_pipeline CompleteDataDumpLoad --date 2018-03-30
6. Refresh the Luigi web page http://ECS_EIP>:8082 to see the data pipeline execution information.
View the Data in the Demo RDS PostgreSQL and BI Report on Metabase
1. In the CLI window created at Set Up the Demo RDS PostgreSQL Database, run the following commands to verify the data processed in the data pipeline. Please replace rds_pg_url_demo_database with the corresponding connection string of the demo RDS PostgreSQL instance. Use the default password N1cetest when connecting to the schema sales_dw.
cd ~/adbpg_client_package/bin
./psql -h rds_pg_url_demo_database -p1921 -Udemo sales_dw
2. In the PG client, execute the SQL to view the data.
select tablename from pg_tables where schemaname='public';
select count(*) from association_rules;
select count(*) from product_info;
select count(*) from invoice;
select count(*) from customer_info;
select count(*) from invoice_time;
select count(*) from invoice_outliers;
3. Navigate to Metabase database Admin setting page to update the target database to the demo RDS PostgreSQL database rds_pg_url_demo_database.

Reach Alibaba Cloud experts for support

Contact Us