All Products
Search
Document Center

AnalyticDB:Import DataFrame data using SQLAlchemy

Last Updated:Nov 24, 2025

This topic describes how to use SQLAlchemy to import Python DataFrame data to AnalyticDB for MySQL.

Prerequisites

Precautions

When you use SQLAlchemy to create a table in AnalyticDB for MySQL, you must configure the checkfirst=False parameter. For example, metadata.create_all(engine,checkfirst=False). Otherwise, an error occurs.

Preparations

  1. Go to the SQL development page.

    • Enterprise Edition, Basic Edition, or Data Lakehouse Edition:

      1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

      2. In the navigation pane on the left, click Job Development > SQL Development.

      3. In the SQLConsole window, select the XIHE engine and an Interactive resource group.

    • Data Warehouse Edition: Connect to the cluster.

  2. Create a database.

    CREATE DATABASE demo_db;

Procedure

This example demonstrates how to import data into an internal table in AnalyticDB for MySQL. The procedure includes automatically generating dynamic random data, constructing static data, creating a database connection, creating a table schema, and writing data.

  1. Run the following code to download Python dependencies.

    pip install pandas
    pip install pymysql
  2. Replace the configuration parameters in the following code and run the code to import DataFrame data into an internal table in AnalyticDB for MySQL.

    import pandas as pd
    from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, Float, DateTime, inspect
    import pytz
    import datetime
    import random
    
    
    # Define a function to dynamically generate random data
    def generate_random_book_data(num_records=4):
        """
        Define a function to dynamically generate random book data
        parameter:
            num_records (int): The number of records to generate, default is 4.
        response:
            pd.DataFrame: A Pandas DataFrame containing random book data.
        """
        book_titles = [
            "The Great Gatsby",
            "To Kill a Mockingbird",
            "1984",
            "Pride and Prejudice",
            "Moby Dick",
            "War and Peace",
            "The Catcher in the Rye",
            "Brave New World",
            "The Hobbit",
            "Crime and Punishment"
        ]
        timezones = ["America/New_York", "America/Chicago", "Europe/London", "Europe/Dublin", "Asia/Tokyo"]
        min_publication_year = 1700
        max_publication_year = 2023
        min_pages = 50
        max_pages = 1000
        current_year = datetime.datetime.now().year
    
        records = []
        for i in range(num_records):
            book_title = random.choice(book_titles)
            publication_year = random.randint(min_publication_year, max_publication_year)
            pages = random.randint(min_pages, max_pages)
    
            timezone = random.choice(timezones)
            publication_date_local = datetime.datetime(
                publication_year,
                random.randint(1, 12),
                random.randint(1, 28),
                random.randint(0, 23),
                random.randint(0, 59),
                random.randint(0, 59)
            )
            publication_date_utc = pytz.timezone(timezone).localize(publication_date_local).astimezone(pytz.utc)
    
            ebook_min_year = publication_year + 1
            ebook_max_year = min(current_year + 10, publication_year + 50)
            if ebook_min_year > ebook_max_year:
                ebook_release_year = ebook_min_year
            else:
                ebook_release_year = random.randint(ebook_min_year, ebook_max_year)
    
            ebook_release = datetime.datetime(
                ebook_release_year,
                random.randint(1, 12),
                random.randint(1, 28),
                random.randint(0, 23),
                random.randint(0, 59),
                random.randint(0, 59)
            )
            record = {
                "book_title": book_title,
                "publication_year": publication_year,
                "pages": pages,
                "publication_date": publication_date_utc,
                "ebook_release": ebook_release,
            }
            records.append(record)
    
        dataframe = pd.DataFrame(records)
        dataframe.index = [f"B{random.randint(1000, 9999)}" for _ in range(num_records)]
        dataframe.index.name = "book_id"
        return dataframe.reset_index()
    
    if __name__ == '__main__':
    
        # Configure the connection information for AnalyticDB for MySQL
        # The database account of AnalyticDB for MySQL
        username = "user"
        # The password of the AnalyticDB for MySQL database account
        password = "password****"
        # The connection address of AnalyticDB for MySQL
        host = "amv-t4ny9j5****.ads.aliyuncs.com"
        # The port number of AnalyticDB for MySQL. The default value is 3306.
        port = 3306
        # The name of the AnalyticDB for MySQL database
        database = "demo_db"
        # The name of the AnalyticDB for MySQL table. If the table does not exist, it is automatically created and the book_id column is defined as the primary key.
        table_name = "test_dataframe_insert_table"
        connection_string = f"mysql+pymysql://{username}:{password}@{host}:{port}/{database}"
    
        # Create an SQLAlchemy connection
        try:
            engine = create_engine(connection_string)
            print("The connection with ADB MySQL was created successfully!")
        except Exception as e:
            print(f"Failed to create connection with ADB MySQL!: {e}")
            exit()
    
        # Construct sample data
        records_fixed = [
            {
                "book_title": "The Great Gatsby",
                "publication_year": 1925,
                "pages": 218,
                "publication_date": pytz.timezone("America/New_York")
                .localize(datetime.datetime(1925, 4, 10, 9, 0, 0))
                .astimezone(pytz.utc),
                "ebook_release": datetime.datetime(2000, 6, 15, 10, 0, 0),
            },
            {
                "book_title": "To Kill a Mockingbird",
                "publication_year": 1960,
                "pages": 281,
                "publication_date": pytz.timezone("America/Chicago")
                .localize(datetime.datetime(1960, 7, 11, 15, 30, 0))
                .astimezone(pytz.utc),
                "ebook_release": datetime.datetime(2002, 8, 20, 12, 0, 0),
            },
            {
                "book_title": "1984",
                "publication_year": 1949,
                "pages": 328,
                "publication_date": pytz.timezone("Europe/London")
                .localize(datetime.datetime(1949, 6, 8, 11, 0, 0))
                .astimezone(pytz.utc),
                "ebook_release": datetime.datetime(1998, 5, 1, 9, 0, 0),
            },
            {
                "book_title": "Pride and Prejudice",
                "publication_year": 1813,
                "pages": 432,
                "publication_date": pytz.timezone("Europe/Dublin")
                .localize(datetime.datetime(1813, 1, 28, 14, 0, 0))
                .astimezone(pytz.utc),
                "ebook_release": datetime.datetime(2003, 12, 10, 14, 30, 0),
            },
        ]
        # Convert the data to a Pandas DataFrame
        dataframe_fixed = pd.DataFrame(
            records_fixed,
            columns=[
                "book_title",
                "publication_year",
                "pages",
                "publication_date",
                "ebook_release",
            ],
            index=pd.Index(["B1001", "B1002", "B1003", "B1004"], name="book_id"),
        )
    
        # Add the index column as a column to the DataFrame
        dataframe_fixed.reset_index(inplace=True)
        # Define the table schema
        metadata = MetaData()
        books_table = Table(
            table_name,
            metadata,
            Column("book_id", String(10), primary_key=True),
            Column("book_title", String(255)),
            Column("publication_year", Integer),
            Column("pages", Integer),
            Column("publication_date", DateTime),
            Column("ebook_release", DateTime)
        )
    
        try:
            inspector = inspect(engine)
            existing_tables = inspector.get_table_names()
            if table_name in existing_tables:
                print(f"Table {table_name} already exists, no need to recreate!")
            else:
                metadata.create_all(engine,checkfirst=False)
                print(f"Table {table_name} Successfully created!")
        except Exception as e:
            print(f"Error of checking or creating table:{e}")
    
        # Write static data
        try:
            dataframe_fixed.to_sql(name=table_name, con=engine, if_exists="append", index=False)
            print(f"Static data is successfully written to the table {table_name}")
        except Exception as e:
            print(f"Error of writing static data: {e}")
    
        # Generate dynamic data and write it to the table
        dataframe_dynamic = generate_random_book_data(100000)
        try:
            dataframe_dynamic.to_sql(name=table_name, con=engine, if_exists="append", index=False)
            print(f"Dynamic data is successfully written to the table {table_name}")
        except Exception as e:
            print(f"Error writing dynamic data:{e}")
    
        finally:
            # Close the connection
            engine.dispose()
    

    Parameter description:

    Parameter

    Description

    username

    The database account of AnalyticDB for MySQL.

    password

    The password of the AnalyticDB for MySQL database account.

    host

    The connection address of AnalyticDB for MySQL.

    port

    The port number of AnalyticDB for MySQL. The default value is 3306.

    database

    The name of the AnalyticDB for MySQL database.

    table_name

    The name of the AnalyticDB for MySQL table. If the table does not exist, it is automatically created and the book_id column is defined as the primary key.

    Note

    For more information about the complete process of importing DataFrame data, see the official SQLAlchemy documentation.

  3. Query data in AnalyticDB for MySQL to verify that the data was imported.

    SELECT * FROM `demo_db`.`test_dataframe_insert_table` LIMIT 20;

    The following result is returned:

    +---------+------------------------+------------------+-------+---------------------+---------------------+
    | book_id | book_title             | publication_year | pages | publication_date    | ebook_release       |
    +---------+------------------------+------------------+-------+---------------------+---------------------+
    | B1752   | 1984                   |             1861 |   493 | 1861-11-08 00:40:32 | 1906-08-15 08:12:56 |
    | B5420   | To Kill a Mockingbird  |             1856 |   175 | 1856-02-26 00:50:57 | 1894-02-12 22:42:06 |
    | B4033   | Crime and Punishment   |             1886 |   675 | 1886-11-20 08:23:29 | 1912-02-14 04:32:22 |
    | B3597   | Moby Dick              |             1719 |   779 | 1719-10-05 02:15:13 | 1734-02-21 06:53:00 |
    | B4708   | To Kill a Mockingbird  |             1811 |   951 | 1811-05-08 06:31:02 | 1830-11-03 22:23:42 |
    | B9493   | The Great Gatsby       |             1976 |   843 | 1976-10-23 10:57:41 | 1986-06-11 22:18:59 |
    | B8066   | Crime and Punishment   |             1750 |   689 | 1750-07-01 12:45:52 | 1766-10-06 00:35:15 |
    | B6823   | 1984                   |             1921 |   462 | 1921-04-12 18:48:16 | 1953-09-14 03:30:51 |
    | B3875   | War and Peace          |             1927 |   893 | 1927-03-12 19:08:28 | 1950-12-14 06:05:49 |
    | B3528   | The Hobbit             |             1713 |   774 | 1713-11-05 13:59:30 | 1734-08-12 13:58:10 |
    | B7133   | Moby Dick              |             1835 |   347 | 1835-07-02 15:03:11 | 1838-03-27 22:29:10 |
    | B1140   | War and Peace          |             1837 |   496 | 1837-11-27 00:27:22 | 1881-08-10 13:37:35 |
    | B3244   | The Catcher in the Rye |             1842 |   739 | 1842-08-02 13:17:45 | 1876-10-07 07:35:05 |
    | B4300   | Brave New World        |             1764 |   654 | 1764-06-26 08:56:17 | 1775-03-15 20:47:09 |
    | B4958   | The Catcher in the Rye |             1867 |   321 | 1867-12-21 16:55:14 | 1907-09-15 14:54:07 |
    | B6145   | Brave New World        |             1844 |   872 | 1844-07-17 20:51:49 | 1866-08-06 14:17:23 |
    | B5163   | The Catcher in the Rye |             1798 |   952 | 1798-05-16 02:48:41 | 1813-09-12 15:07:39 |
    | B1310   | 1984                   |             1975 |   588 | 1975-02-27 07:03:15 | 1985-03-09 23:27:13 |
    | B1857   | Crime and Punishment   |             1990 |    59 | 1990-02-28 07:02:26 | 2015-10-19 08:21:48 |
    | B3155   | To Kill a Mockingbird  |             1928 |   236 | 1928-05-17 13:04:56 | 1958-11-01 16:19:38 |
    +---------+------------------------+------------------+-------+---------------------+---------------------+