Apache Airflow is an open source scheduler that orchestrates and schedules workflows as directed acyclic graphs (DAGs). Use it to automate extract, transform, load (ETL) pipelines against your AnalyticDB for MySQL cluster—scheduling SQL jobs for both offline batch processing and real-time data workloads.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL Data Lakehouse Edition cluster
Apache Airflow installed. See Installation in the Airflow documentation.
The IP address of the Airflow server added to the IP address whitelist of your AnalyticDB for MySQL cluster. See Configure an IP address whitelist.
How it works
The setup follows four steps:
Verify that the
apache-airflow-providers-mysqlpackage is installed.Create a connection from Airflow to your AnalyticDB for MySQL cluster.
Confirm the DAG file directory on the Airflow server.
Write and trigger a DAG that runs your SQL jobs.
Step 1: Install the MySQL provider
In the Airflow web interface, choose Admin > Providers.
On the Providers page, check whether
apache-airflow-providers-mysqlis listed.If it is not listed, install it:
ImportantIf the installation fails with
OSError: mysql_config not found, first install the MySQL development libraries, then retry:yum install mysql-devel pip install apache-airflow-providers-mysqlpip install apache-airflow-providers-mysql
Step 2: Connect Airflow to your cluster
In the Airflow web interface, choose Admin > Connections.
Click the add icon. On the Add Connections page, configure the following parameters:
Connection id: A name for this connection (for example,
adb_mysql)Connection Type:
MySQLHost: The endpoint of your AnalyticDB for MySQL cluster. Find it on the Cluster Information page of the AnalyticDB for MySQL console.
Login: Your database account name
Password: Your database account password
Port:
3306
All other parameters are optional. Configure them based on your requirements.
Click Save.
Step 3: Confirm the DAG directory
Go to the Airflow installation directory:
cd /root/airflowCheck the value of
dags_folderinairflow.cfg:cat file.cfgIf the folder specified by
dags_folderdoes not exist, create it. For example, ifdags_folderpoints to/root/airflow/dags:mkdir /root/airflow/dags
Step 4: Write and trigger a DAG
Create a DAG file in the
dags_folderdirectory. The following example (mysql_dags.py) defines two tasks that run against AnalyticDB for MySQL using the connection you created in Step 2:from airflow import DAG from airflow.providers.mysql.operators.mysql import MySqlOperator from airflow.utils.dates import days_ago default_args = { 'owner': 'airflow', } dag = DAG( 'example_mysql', default_args=default_args, start_date=days_ago(2), tags=['example'], ) # Task 1: list available databases mysql_test = MySqlOperator( task_id='mysql_test', mysql_conn_id='adb_mysql', # matches the Connection id from Step 2 sql='SHOW DATABASES;', dag=dag, ) # Task 2: query a table mysql_test_task = MySqlOperator( task_id='mysql_test_task', mysql_conn_id='adb_mysql', sql='SELECT * FROM test;', # replace with your SQL statement dag=dag, ) mysql_test >> mysql_test_task if __name__ == "__main__": dag.cli()Key parameters:
mysql_conn_id: the Connection id you configured in Step 2sql: the SQL statement to run
For all available DAG parameters, see DAGs in the Airflow documentation.
In the Airflow web interface, click the DAGs tab and find
example_mysql.Click the
trigger icon in the upper-right corner of the DAG page to run the DAG.After the run completes, click the green circle to the right of the DAG name to view execution details.
ImportantThe execution time displayed in the Airflow UI is in UTC. If your local time is UTC+8, the displayed time is 8 hours earlier than your local time.


What's next
DAGs — configure scheduling intervals, retries, and dependencies for your DAG
Installation of Airflow — review advanced installation options if you need to update your environment
Configure an IP address whitelist — manage which hosts can connect to your AnalyticDB for MySQL cluster