Use SQLAlchemy to import DataFrame data to AnalyticDB for MySQL - AnalyticDB

This topic describes how to use SQLAlchemy to import Python DataFrame data to AnalyticDB for MySQL.

Prerequisites

Python 3.7 or later is installed.
SQLAlchemy is installed.
A database account is created for the AnalyticDB for MySQL cluster.
- If you use an Alibaba Cloud account to access the cluster, you only need to create a privileged account.
- If you use a RAM user to access the cluster, you must create a privileged account and a standard account, grant permissions on the specified databases and tables to the standard account, and attach the RAM user to the standard account.
The IP address of the server where the Python application runs has been added to the whitelist of the AnalyticDB for MySQL cluster.

Precautions

When you use SQLAlchemy to create a table in AnalyticDB for MySQL, you must configure the checkfirst=False parameter. For example, metadata.create_all(engine,checkfirst=False). Otherwise, an error occurs.

Preparations

Go to the SQL development page.
- Enterprise Edition, Basic Edition, or Data Lakehouse Edition:
  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
  2. In the navigation pane on the left, click Job Development > SQL Development.
  3. In the SQLConsole window, select the XIHE engine and an Interactive resource group.
- Data Warehouse Edition: Connect to the cluster.
Create a database.
```
CREATE DATABASE demo_db;
```

Procedure

This example demonstrates how to import data into an internal table in AnalyticDB for MySQL. The procedure includes automatically generating dynamic random data, constructing static data, creating a database connection, creating a table schema, and writing data.

Run the following code to download Python dependencies.
```
pip install pandas
pip install pymysql
```

Replace the configuration parameters in the following code and run the code to import DataFrame data into an internal table in AnalyticDB for MySQL.

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, Float, DateTime, inspect
import pytz
import datetime
import random


# Define a function to dynamically generate random data
def generate_random_book_data(num_records=4):
    """
    Define a function to dynamically generate random book data
    parameter:
        num_records (int): The number of records to generate, default is 4.
    response:
        pd.DataFrame: A Pandas DataFrame containing random book data.
    """
    book_titles = [
        "The Great Gatsby",
        "To Kill a Mockingbird",
        "1984",
        "Pride and Prejudice",
        "Moby Dick",
        "War and Peace",
        "The Catcher in the Rye",
        "Brave New World",
        "The Hobbit",
        "Crime and Punishment"
    ]
    timezones = ["America/New_York", "America/Chicago", "Europe/London", "Europe/Dublin", "Asia/Tokyo"]
    min_publication_year = 1700
    max_publication_year = 2023
    min_pages = 50
    max_pages = 1000
    current_year = datetime.datetime.now().year

    records = []
    for i in range(num_records):
        book_title = random.choice(book_titles)
        publication_year = random.randint(min_publication_year, max_publication_year)
        pages = random.randint(min_pages, max_pages)

        timezone = random.choice(timezones)
        publication_date_local = datetime.datetime(
            publication_year,
            random.randint(1, 12),
            random.randint(1, 28),
            random.randint(0, 23),
            random.randint(0, 59),
            random.randint(0, 59)
        )
        publication_date_utc = pytz.timezone(timezone).localize(publication_date_local).astimezone(pytz.utc)

        ebook_min_year = publication_year + 1
        ebook_max_year = min(current_year + 10, publication_year + 50)
        if ebook_min_year > ebook_max_year:
            ebook_release_year = ebook_min_year
        else:
            ebook_release_year = random.randint(ebook_min_year, ebook_max_year)

        ebook_release = datetime.datetime(
            ebook_release_year,
            random.randint(1, 12),
            random.randint(1, 28),
            random.randint(0, 23),
            random.randint(0, 59),
            random.randint(0, 59)
        )
        record = {
            "book_title": book_title,
            "publication_year": publication_year,
            "pages": pages,
            "publication_date": publication_date_utc,
            "ebook_release": ebook_release,
        }
        records.append(record)

    dataframe = pd.DataFrame(records)
    dataframe.index = [f"B{random.randint(1000, 9999)}" for _ in range(num_records)]
    dataframe.index.name = "book_id"
    return dataframe.reset_index()

if __name__ == '__main__':

    # Configure the connection information for AnalyticDB for MySQL
    # The database account of AnalyticDB for MySQL
    username = "user"
    # The password of the AnalyticDB for MySQL database account
    password = "password****"
    # The connection address of AnalyticDB for MySQL
    host = "amv-t4ny9j5****.ads.aliyuncs.com"
    # The port number of AnalyticDB for MySQL. The default value is 3306.
    port = 3306
    # The name of the AnalyticDB for MySQL database
    database = "demo_db"
    # The name of the AnalyticDB for MySQL table. If the table does not exist, it is automatically created and the book_id column is defined as the primary key.
    table_name = "test_dataframe_insert_table"
    connection_string = f"mysql+pymysql://{username}:{password}@{host}:{port}/{database}"

    # Create an SQLAlchemy connection
    try:
        engine = create_engine(connection_string)
        print("The connection with ADB MySQL was created successfully!")
    except Exception as e:
        print(f"Failed to create connection with ADB MySQL!: {e}")
        exit()

    # Construct sample data
    records_fixed = [
        {
            "book_title": "The Great Gatsby",
            "publication_year": 1925,
            "pages": 218,
            "publication_date": pytz.timezone("America/New_York")
            .localize(datetime.datetime(1925, 4, 10, 9, 0, 0))
            .astimezone(pytz.utc),
            "ebook_release": datetime.datetime(2000, 6, 15, 10, 0, 0),
        },
        {
            "book_title": "To Kill a Mockingbird",
            "publication_year": 1960,
            "pages": 281,
            "publication_date": pytz.timezone("America/Chicago")
            .localize(datetime.datetime(1960, 7, 11, 15, 30, 0))
            .astimezone(pytz.utc),
            "ebook_release": datetime.datetime(2002, 8, 20, 12, 0, 0),
        },
        {
            "book_title": "1984",
            "publication_year": 1949,
            "pages": 328,
            "publication_date": pytz.timezone("Europe/London")
            .localize(datetime.datetime(1949, 6, 8, 11, 0, 0))
            .astimezone(pytz.utc),
            "ebook_release": datetime.datetime(1998, 5, 1, 9, 0, 0),
        },
        {
            "book_title": "Pride and Prejudice",
            "publication_year": 1813,
            "pages": 432,
            "publication_date": pytz.timezone("Europe/Dublin")
            .localize(datetime.datetime(1813, 1, 28, 14, 0, 0))
            .astimezone(pytz.utc),
            "ebook_release": datetime.datetime(2003, 12, 10, 14, 30, 0),
        },
    ]
    # Convert the data to a Pandas DataFrame
    dataframe_fixed = pd.DataFrame(
        records_fixed,
        columns=[
            "book_title",
            "publication_year",
            "pages",
            "publication_date",
            "ebook_release",
        ],
        index=pd.Index(["B1001", "B1002", "B1003", "B1004"], name="book_id"),
    )

    # Add the index column as a column to the DataFrame
    dataframe_fixed.reset_index(inplace=True)
    # Define the table schema
    metadata = MetaData()
    books_table = Table(
        table_name,
        metadata,
        Column("book_id", String(10), primary_key=True),
        Column("book_title", String(255)),
        Column("publication_year", Integer),
        Column("pages", Integer),
        Column("publication_date", DateTime),
        Column("ebook_release", DateTime)
    )

    try:
        inspector = inspect(engine)
        existing_tables = inspector.get_table_names()
        if table_name in existing_tables:
            print(f"Table {table_name} already exists, no need to recreate!")
        else:
            metadata.create_all(engine,checkfirst=False)
            print(f"Table {table_name} Successfully created!")
    except Exception as e:
        print(f"Error of checking or creating table:{e}")

    # Write static data
    try:
        dataframe_fixed.to_sql(name=table_name, con=engine, if_exists="append", index=False)
        print(f"Static data is successfully written to the table {table_name}")
    except Exception as e:
        print(f"Error of writing static data: {e}")

    # Generate dynamic data and write it to the table
    dataframe_dynamic = generate_random_book_data(100000)
    try:
        dataframe_dynamic.to_sql(name=table_name, con=engine, if_exists="append", index=False)
        print(f"Dynamic data is successfully written to the table {table_name}")
    except Exception as e:
        print(f"Error writing dynamic data:{e}")

    finally:
        # Close the connection
        engine.dispose()

Parameter description:

Parameter	Description
username	The database account of AnalyticDB for MySQL.
password	The password of the AnalyticDB for MySQL database account.
host	The connection address of AnalyticDB for MySQL.
port	The port number of AnalyticDB for MySQL. The default value is 3306.
database	The name of the AnalyticDB for MySQL database.
table_name	The name of the AnalyticDB for MySQL table. If the table does not exist, it is automatically created and the book_id column is defined as the primary key.

Note

For more information about the complete process of importing DataFrame data, see the official SQLAlchemy documentation.

Query data in AnalyticDB for MySQL to verify that the data was imported.

SELECT * FROM `demo_db`.`test_dataframe_insert_table` LIMIT 20;

The following result is returned:

+---------+------------------------+------------------+-------+---------------------+---------------------+
| book_id | book_title             | publication_year | pages | publication_date    | ebook_release       |
+---------+------------------------+------------------+-------+---------------------+---------------------+
| B1752   | 1984                   |             1861 |   493 | 1861-11-08 00:40:32 | 1906-08-15 08:12:56 |
| B5420   | To Kill a Mockingbird  |             1856 |   175 | 1856-02-26 00:50:57 | 1894-02-12 22:42:06 |
| B4033   | Crime and Punishment   |             1886 |   675 | 1886-11-20 08:23:29 | 1912-02-14 04:32:22 |
| B3597   | Moby Dick              |             1719 |   779 | 1719-10-05 02:15:13 | 1734-02-21 06:53:00 |
| B4708   | To Kill a Mockingbird  |             1811 |   951 | 1811-05-08 06:31:02 | 1830-11-03 22:23:42 |
| B9493   | The Great Gatsby       |             1976 |   843 | 1976-10-23 10:57:41 | 1986-06-11 22:18:59 |
| B8066   | Crime and Punishment   |             1750 |   689 | 1750-07-01 12:45:52 | 1766-10-06 00:35:15 |
| B6823   | 1984                   |             1921 |   462 | 1921-04-12 18:48:16 | 1953-09-14 03:30:51 |
| B3875   | War and Peace          |             1927 |   893 | 1927-03-12 19:08:28 | 1950-12-14 06:05:49 |
| B3528   | The Hobbit             |             1713 |   774 | 1713-11-05 13:59:30 | 1734-08-12 13:58:10 |
| B7133   | Moby Dick              |             1835 |   347 | 1835-07-02 15:03:11 | 1838-03-27 22:29:10 |
| B1140   | War and Peace          |             1837 |   496 | 1837-11-27 00:27:22 | 1881-08-10 13:37:35 |
| B3244   | The Catcher in the Rye |             1842 |   739 | 1842-08-02 13:17:45 | 1876-10-07 07:35:05 |
| B4300   | Brave New World        |             1764 |   654 | 1764-06-26 08:56:17 | 1775-03-15 20:47:09 |
| B4958   | The Catcher in the Rye |             1867 |   321 | 1867-12-21 16:55:14 | 1907-09-15 14:54:07 |
| B6145   | Brave New World        |             1844 |   872 | 1844-07-17 20:51:49 | 1866-08-06 14:17:23 |
| B5163   | The Catcher in the Rye |             1798 |   952 | 1798-05-16 02:48:41 | 1813-09-12 15:07:39 |
| B1310   | 1984                   |             1975 |   588 | 1975-02-27 07:03:15 | 1985-03-09 23:27:13 |
| B1857   | Crime and Punishment   |             1990 |    59 | 1990-02-28 07:02:26 | 2015-10-19 08:21:48 |
| B3155   | To Kill a Mockingbird  |             1928 |   236 | 1928-05-17 13:04:56 | 1958-11-01 16:19:38 |
+---------+------------------------+------------------+-------+---------------------+---------------------+