All Products
Search
Document Center

AnalyticDB:Use Airflow to schedule XIHE SQL jobs

Last Updated:Mar 28, 2026

Apache Airflow is an open source scheduler that orchestrates and schedules workflows as directed acyclic graphs (DAGs). Use it to automate extract, transform, load (ETL) pipelines against your AnalyticDB for MySQL cluster—scheduling SQL jobs for both offline batch processing and real-time data workloads.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for MySQL Data Lakehouse Edition cluster

  • Apache Airflow installed. See Installation in the Airflow documentation.

  • The IP address of the Airflow server added to the IP address whitelist of your AnalyticDB for MySQL cluster. See Configure an IP address whitelist.

How it works

The setup follows four steps:

  1. Verify that the apache-airflow-providers-mysql package is installed.

  2. Create a connection from Airflow to your AnalyticDB for MySQL cluster.

  3. Confirm the DAG file directory on the Airflow server.

  4. Write and trigger a DAG that runs your SQL jobs.

Step 1: Install the MySQL provider

  1. In the Airflow web interface, choose Admin > Providers.

  2. On the Providers page, check whether apache-airflow-providers-mysql is listed.

  3. If it is not listed, install it:

    Important

    If the installation fails with OSError: mysql_config not found, first install the MySQL development libraries, then retry:

    yum install mysql-devel
    pip install apache-airflow-providers-mysql
    pip install apache-airflow-providers-mysql

Step 2: Connect Airflow to your cluster

  1. In the Airflow web interface, choose Admin > Connections.

  2. Click the add icon. On the Add Connections page, configure the following parameters:

    1. Connection id: A name for this connection (for example, adb_mysql)

    2. Connection Type: MySQL

    3. Host: The endpoint of your AnalyticDB for MySQL cluster. Find it on the Cluster Information page of the AnalyticDB for MySQL console.

    4. Login: Your database account name

    5. Password: Your database account password

    6. Port: 3306

    All other parameters are optional. Configure them based on your requirements.

  3. Click Save.

Step 3: Confirm the DAG directory

  1. Go to the Airflow installation directory:

    cd /root/airflow
  2. Check the value of dags_folder in airflow.cfg:

    cat file.cfg
  3. If the folder specified by dags_folder does not exist, create it. For example, if dags_folder points to /root/airflow/dags:

    mkdir /root/airflow/dags

Step 4: Write and trigger a DAG

  1. Create a DAG file in the dags_folder directory. The following example (mysql_dags.py) defines two tasks that run against AnalyticDB for MySQL using the connection you created in Step 2:

    from airflow import DAG
    from airflow.providers.mysql.operators.mysql import MySqlOperator
    from airflow.utils.dates import days_ago
    
    default_args = {
        'owner': 'airflow',
    }
    
    dag = DAG(
        'example_mysql',
        default_args=default_args,
        start_date=days_ago(2),
        tags=['example'],
    )
    
    # Task 1: list available databases
    mysql_test = MySqlOperator(
        task_id='mysql_test',
        mysql_conn_id='adb_mysql',   # matches the Connection id from Step 2
        sql='SHOW DATABASES;',
        dag=dag,
    )
    
    # Task 2: query a table
    mysql_test_task = MySqlOperator(
        task_id='mysql_test_task',
        mysql_conn_id='adb_mysql',
        sql='SELECT * FROM test;',   # replace with your SQL statement
        dag=dag,
    )
    
    mysql_test >> mysql_test_task
    
    if __name__ == "__main__":
        dag.cli()

    Key parameters:

    • mysql_conn_id: the Connection id you configured in Step 2

    • sql: the SQL statement to run

    For all available DAG parameters, see DAGs in the Airflow documentation.

  2. In the Airflow web interface, click the DAGs tab and find example_mysql.

  3. Click the image trigger icon in the upper-right corner of the DAG page to run the DAG.

  4. After the run completes, click the green circle to the right of the DAG name to view execution details.

    Important

    The execution time displayed in the Airflow UI is in UTC. If your local time is UTC+8, the displayed time is 8 hours earlier than your local time.

    image

    image

What's next