All Products
Search
Document Center

AnalyticDB:Import OSS data using external tables

Last Updated:Nov 05, 2025

AnalyticDB for MySQL lets you import external data using external tables. This topic describes how to use external tables to import data from OSS into an AnalyticDB for MySQL cluster.

Prerequisites

  • The AnalyticDB for MySQL cluster and the OSS bucket are in the same region. For more information, see Activate OSS.

  • The data file is uploaded to an OSS directory.

  • Elastic Network Interface (ENI) access is enabled for your AnalyticDB for MySQL Data Lakehouse Edition cluster.

    Important
    • Log on to the AnalyticDB for MySQL console. On the Cluster Information page, in the Network Information section, turn on the Elastic Network Interface (ENI) network switch.

    • Enabling or disabling the ENI network interrupts the database connection for about 2 minutes. During this period, you cannot read or write data. Therefore, carefully evaluate the impact before you enable or disable the ENI network.

Data preparation

In this example, the data file person.csv is uploaded to the testBucketName/adb/dt=2023-06-15 directory in OSS. Line feeds are used as row delimiters and commas (,) are used as column delimiters. The following sample data is in the person.csv file:

1,james,10,2023-06-15
2,bond,20,2023-06-15
3,jack,30,2023-06-15
4,lucy,40,2023-06-15       

Procedure

Enterprise Edition, Basic Edition, and Data Lakehouse Edition

  1. Go to the SQL editor.

    1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

    2. In the navigation pane on the left, click Job Development > SQL Development.

  2. Import data.

    You can import data using the regular import (default) method or the elastic import method. The regular import method reads source data on compute nodes and builds indexes on storage nodes. This method consumes computing and storage resources. The elastic import method reads source data and builds indexes in a serverless Spark Job. This method consumes resources from a Job-type resource group. The elastic import method is supported only on Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters that run Milvus version 3.1.10.0 or later and have a Job-type resource group. For more information, see Data import methods.

    Regular import

    1. Create an external database.

      CREATE EXTERNAL DATABASE adb_external_db;
    2. Create an external table. Use the CREATE EXTERNAL TABLE statement to create an OSS external table in the adb_external_db database. In this example, the external table is named adb_external_db.person.

      Note

      An AnalyticDB for MySQL external table must have the same field names, number of fields, field order, and field types as the source data in the OSS file.

      Create a non-partitioned OSS external table

      CREATE EXTERNAL TABLE adb_external_db.person
      (
       id INT,
       name VARCHAR(1023),
       age INT,
       dt VARCHAR(1023)
      )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY  ','
      STORED AS TEXTFILE
      LOCATION  'oss://testBucketName/adb/dt=2023-06-15/';

      Create a partitioned OSS external table

      Create a partitioned OSS external table and add partitions to the table to query its data.

      1. Create a partitioned OSS external table.

        CREATE EXTERNAL TABLE adb_external_db.person
        (
         id INT,
         name VARCHAR(1023) ,
         age INT
        )
        PARTITIONED BY (dt STRING)
        ROW FORMAT DELIMITED FIELDS TERMINATED BY  ','
        STORED AS TEXTFILE
        LOCATION  'oss://testBucketName/adb/';
      2. Add a partition. You can run the ALTER TABLE ADD PARTITION statement to manually add a partition, or run the MSCK REPAIR TABLE statement to automatically identify and add partitions.

        ALTER TABLE adb_external_db.person ADD PARTITION (dt='2023-06-15') LOCATION 'oss://testBucketName/adb/dt=2023-06-15/';
        Note

      For more information about the syntax for creating OSS external tables, see CREATE EXTERNAL TABLE.

    3. Query data.

      After the table is created, you can query the data in OSS by running a SELECT statement in AnalyticDB for MySQL.

      SELECT * FROM adb_external_db.person;

      The following result is returned:

      +------+-------+------+-----------+
      | id   | name  | age  | dt        |
      +------+-------+------+-----------+
      |    1 | james |   10 |2023-06-15 |
      |    2 | bond  |   20 |2023-06-15 |
      |    3 | jack  |   30 |2023-06-15 |
      |    4 | lucy  |   40 |2023-06-15 |
      +------+-------+------+-----------+
    4. Create a database in AnalyticDB for MySQL. If a database already exists, skip this step. The following statement provides an example:

      CREATE DATABASE adb_demo; 
    5. Create a table in AnalyticDB for MySQL to store data imported from OSS. The following statement provides an example:

      Note

      The internal table that you create must have the same field names, number of fields, field order, and field types as the external table.

      CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test(
          id INT,
          name VARCHAR(1023),
          age INT,
          dt VARCHAR(1023)
      )
      DISTRIBUTED BY HASH(id);
    6. Import data into the table.

      • Method 1: Use the INSERT INTO statement to import data. If a primary key value is duplicated, the new data is ignored. This is equivalent to using INSERT IGNORE INTO. For more information, see INSERT INTO.

        INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
      • Method 2: Use the INSERT OVERWRITE INTO statement to synchronously import data. This overwrites the existing data in the table.

        INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
      • Method 3: Use the INSERT OVERWRITE INTO statement to asynchronously import data. For more information, see Asynchronous write.

        SUBMIT JOB INSERT OVERWRITE adb_demo.adb_import_test SELECT * FROM adb_external_db.person;

    Elastic import

    1. Create a database. If a database already exists, skip this step. The following statement provides an example:

      CREATE DATABASE adb_demo; 
    2. Create an external table.

      Note
      • The field names, number of fields, field order, and field types of the AnalyticDB for MySQL external table must match the source data in the OSS file.

      • Elastic import supports creating external tables only using the CREATE TABLE statement.

      CREATE TABLE oss_import_test_external_table
      (
        id INT(1023),
        name VARCHAR(1023),
        age INT,
        dt VARCHAR(1023)
      )
      ENGINE='OSS'
      TABLE_PROPERTIES='{
          "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
          "url":"oss://testBucketName/adb/dt=2023-06-15/person.csv",
          "accessid":"accesskey_id",
          "accesskey":"accesskey_secret",
          "delimiter":","
      }';
      Important

      When you create an external table, the supported TABLE_PROPERTIES parameters vary based on the file format, such as CSV, Parquet, or ORC:

      • CSV format: Only the endpoint, url, accessid, accesskey, format, delimiter, null_value, and partition_column parameters are supported.

      • Parquet format: Only the endpoint, url, accessid, accesskey, format, and partition_column parameters are supported.

      • ORC format: Only the endpoint, url, accessid, accesskey, format, and partition_column parameters are supported.

      For more information about the parameters that you can set for external tables and their descriptions, see OSS non-partitioned external tables and OSS partitioned external tables.

    3. Query data.

      After the table is created, you can query data from OSS by running a SELECT statement in AnalyticDB for MySQL.

      SELECT * FROM oss_import_test_external_table;

      The following result is returned:

      +------+-------+------+-----------+
      | id   | name  | age  | dt        |
      +------+-------+------+-----------+
      |    1 | james |   10 |2023-06-15 |
      |    2 | bond  |   20 |2023-06-15 |
      |    3 | jack  |   30 |2023-06-15 |
      |    4 | lucy  |   40 |2023-06-15 |
      +------+-------+------+-----------+
      4 rows in set (0.35 sec)
    4. Create a table in AnalyticDB for MySQL to store data imported from OSS. The following statement provides an example:

      Note

      The internal table that you create must have the same field names, number of fields, field order, and field types as the external table.

      CREATE TABLE adb_import_test
      (
        id INT,
        name VARCHAR(1023),
        age INT,
        dt VARCHAR(1023)
      )
      DISTRIBUTED BY HASH(id);
    5. Import data.

      Important

      Elastic import supports importing data only using the INSERT OVERWRITE INTO statement.

      • Method 1: Run the INSERT OVERWRITE INTO statement to elastically import data. This overwrites the existing data in the table. The following statement provides an example:

        /+*elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
        INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
      • Method 2: Asynchronously run the INSERT OVERWRITE INTO statement to elastically import data. You can use the SUBMIT JOB statement to submit an asynchronous task that is scheduled in the background.

        /*+elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
        SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
        Important

        You cannot set a priority queue when you asynchronously submit an elastic import task.

        The following result is returned:

        +---------------------------------------+
        | job_id                                |
        +---------------------------------------+
        | 202308151719510210170190**********    |

      After you submit an asynchronous job using SUBMIT JOB, the returned result indicates only that the job was submitted successfully. You can use the job_id to terminate the asynchronous job or query its status to check whether the job was successfully executed. For more information, see Submit an asynchronous import job.

      Hint parameters:

      • elastic_load: specifies whether to use elastic import. Valid values: true and false. Default value: false.

      • elastic_load_configs: the configuration parameters of the elastic import feature. You must enclose the parameters within brackets ([ ]) and separate multiple parameters with vertical bars (|). The following table describes the parameters.

        Parameter

        Required

        Description

        adb.load.resource.group.name

        Yes

        The name of the job resource group that runs the elastic import job.

        adb.load.job.max.acu

        No

        The maximum amount of resources for an elastic import job. Unit: AnalyticDB compute units (ACUs). Minimum value: 5 ACUs. Default value: number of shards plus 1.

        Execute the following statement to query the number of shards in the cluster:

        SELECT count(1) FROM information_schema.kepler_meta_shards;

        spark.driver.resourceSpec

        No

        The resource type of the Spark driver. Default value: small. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.

        spark.executor.resourceSpec

        No

        The resource type of the Spark executor. Default value: large. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.

        spark.adb.executorDiskSize

        No

        The disk capacity of the Spark executor. Valid values: (0,100]. Unit: GiB. Default value: 10 GiB. For more information, see the "Specify driver and executor resources" section of the Conf configuration parameters topic.

    6. (Optional) Check whether the submitted import task is an elastic import task.

      SELECT job_name, (job_type = 3) AS is_elastic_load FROM INFORMATION_SCHEMA.kepler_meta_async_jobs where job_name = "2023081818010602101701907303151******";

      The following result is returned:

      +---------------------------------------+------------------+
      | job_name                              | is_elastic_load  |
      +---------------------------------------+------------------+
      | 20230815171951021017019072*********** |       1          |
      +---------------------------------------+------------------+

      If the value of is_elastic_load is 1, the submitted import task is an elastic import task. If the value is 0, the submitted import task is a regular import task.

Data Lakehouse Edition

  1. Connect to the cluster and create a database.

    CREATE DATABASE adb_demo;
  2. Create an external table. Use the CREATE TABLE syntax to create an OSS external table in CSV, Parquet, or ORC format. For more information about the syntax, see OSS external table syntax.

    This example uses a non-partitioned external table in CSV format.

    CREATE TABLE IF NOT EXISTS oss_import_test_external_table
    (
        id INT,
        name VARCHAR(1023),
        age INT,
        dt VARCHAR(1023) 
    )
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "url":"oss://testBucketname/adb/dt=2023-06-15/person.csv",
        "accessid":"accesskey_id",
        "accesskey":"accesskey_secret",
        "delimiter":",",
        "skip_header_line_count":0,
        "charset":"utf-8"
    }'; 
  3. Query data from the oss_import_test_external_table external table.

    Note

    Querying external tables with large data volumes in CSV, Parquet, or ORC format can cause high performance overhead. To improve query efficiency, import the data from the OSS external table into AnalyticDB for MySQL, as described in the following steps.

    SELECT * FROM oss_import_test_external_table;
  4. Create a table in AnalyticDB for MySQL to store the data imported from the OSS external table.

    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
       id INT,
       name VARCHAR(1023),
       age INT,
       dt VARCHAR(1023) 
    )
    DISTRIBUTED BY HASH(id);
  5. Run the INSERT statement to import data from the OSS external table into AnalyticDB for MySQL.

    Important

    The INSERT INTO and INSERT OVERWRITE SELECT statements import data synchronously by default. This process requires a long-running connection between the client and the AnalyticDB for MySQL server. If you import a large amount of data, such as several hundred gigabytes, the connection may be interrupted by network issues, which causes the data import to fail. Therefore, if you have a large data volume, we recommend that you use SUBMIT JOB INSERT OVERWRITE SELECT to asynchronously import the data.

    • Method 1: Run the INSERT INTO statement to import data. If a primary key is duplicated, the current write operation is ignored and the data is not updated. This is equivalent to INSERT IGNORE INTO. For more information, see INSERT INTO. The following statement provides an example:

      INSERT INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 2: Run the INSERT OVERWRITE statement to import data. This overwrites the existing data in the table. The following statement provides an example:

      INSERT OVERWRITE adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 3: Asynchronously run the INSERT OVERWRITE statement to import data. You can use SUBMIT JOB to submit an asynchronous task that is scheduled in the background. You can add a hint (/*+ direct_batch_load=true*/) to the write task to accelerate the task. For more information, see Asynchronous writes. The following statement provides an example:

      SUBMIT JOB INSERT OVERWRITE adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;

      The following result is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For more information about how to asynchronously submit tasks, see Submit an asynchronous import job.

OSS external table syntax

Enterprise Edition, Basic Edition, and Data Lakehouse Edition

For more information about the syntax and description of OSS external tables in Enterprise Edition, Basic Edition, and Data Lakehouse Edition, see OSS external tables.

Data Lakehouse Edition

OSS non-partitioned external tables

CREATE TABLE [IF NOT EXISTS] table_name
(column_name column_type[, …])
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"endpoint",
    "url":"OSS_LOCATION",
    "accessid":"accesskey_id",
    "accesskey":"accesskey_secret",
    "format":"csv|orc|parquet|text
    "delimiter|field_delimiter":";",
    "skip_header_line_count":1,
    "charset":"utf-8"
}';

Appearance Type

Parameter

Required

Description

External tables in CSV, Parquet, and ORC formats

ENGINE='OSS'

Yes

The table engine. Set the value to OSS.

endpoint

The Endpoint of the OSS bucket. Currently, AnalyticDB for MySQL can access OSS only over a VPC.

Note

You can log on to the OSS console, click the target bucket, and view the Endpoint on the Overview page of the bucket.

url

The path of the OSS file or directory.

  • For an OSS file, specify the absolute path. Example: oss://testBucketname/adb/oss_import_test_data.csv.

  • The directory path must end with a forward slash (/). Example: oss://testBucketname/adb/.

    Note

    If you specify a directory path, the external table contains all data in the directory after the table is created.

  • The wildcard character * is supported at the end of the path to match all files or folders that have a specific prefix. Example: oss://testBucketname/adb/list_file_with_prefix/test*

    Note

    This fuzzy query example matches all files and folders that meet the prefix condition, such as oss://testBucketname/adb/list_file_with_prefix/testfile1 and

    oss://testBucketname/adb/list_file_with_prefix/test1/file2.

accessid

The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user with OSS management permissions.

For information about how to obtain an AccessKey ID, see Accounts and permissions.

accesskey

The AccessKey secret of an Alibaba Cloud account or a RAM user with OSS management permissions.

To obtain an AccessKey secret, see Accounts and permissions.

format

Conditionally required

The format of the data file.

  • To create a Parquet external table, you must set this parameter to parquet.

  • To create an ORC external table, you must set this parameter to orc.

  • To create a Text external table, you must set this parameter to text.

  • If you do not specify this parameter, the file format is csv by default.

External tables in CSV and Text formats

delimiter|field_delimiter

Yes

The column delimiter of the data file.

  • If the file type is csv, set this parameter to delimiter.

  • If the file type is text, set this parameter to field_delimiter.

External tables in CSV format

null_value

No

Defines the NULL value for the CSV data file. By default, an empty value is defined as NULL. Example: "null_value": "".

Important

This parameter is supported only for clusters of Milvus version 3.1.4.2 or later.

ossnull

The rule that is used to define NULL values in the CSV data file. Valid values:

  • 1 (default): indicates EMPTY_SEPARATORS. Only empty values are defined as NULL.

    Example: a,"",,c --> "a","",NULL,"c"

  • 2: indicates EMPTY_QUOTES. Only "" is defined as NULL.

    Example: a,"",,c --> "a",NULL,"","c"

  • 3: indicates BOTH. Both empty values and "" are defined as NULL.

    Example: a,"",,c --> "a",NULL,NULL,"c"

  • 4: indicates NEITHER. Neither empty values nor "" are defined as NULL.

    Example: a,"",,c --> "a","","","c"

Note

The preceding examples assume that "null_value": "".

skip_header_line_count

The number of header rows to skip at the beginning of the data file during data import. If the first row of a CSV file is the table header, you can set this parameter to 1 to skip the header row during data import.

Default value: 0. This value indicates that no rows are skipped.

oss_ignore_quote_and_escape

Specifies whether to ignore quotation marks and escape characters in field values. Default value: false. This value indicates that quotation marks and escape characters are not ignored.

Important

This parameter is supported only for clusters of Milvus version 3.1.4.2 or later.

charset

The character set of the OSS external table. Valid values:

  • utf-8 (default)

  • gbk

Important

This parameter is supported only for clusters of Milvus version 3.1.10.4 or later.

Note
  • The column names in the `CREATE TABLE` statement for the external table must be identical to the column names in the Parquet or ORC file but are case-insensitive. The order of the columns must also be the same.

  • When you create an external table, you can select only some columns from the Parquet or ORC file to be columns in the external table. Unselected columns are not imported.

  • If the `CREATE TABLE` statement for the external table includes a column that does not exist in the Parquet or ORC file, queries for that column return NULL.

AnalyticDB for MySQL supports reading from and writing to Hive TEXT files using an external table in CSV format on OSS. The following statement can be used to create an external table:

CREATE TABLE adb_csv_hive_format_oss (
  a tinyint,
  b smallint,
  c int,
  d bigint,
  e boolean,
  f float,
  g double,
  h varchar,
  i varchar, -- binary
  j timestamp,
  k DECIMAL(10, 4),
  l varchar, -- char(10)
  m varchar, -- varchar(100)
  n date
) ENGINE = 'OSS' TABLE_PROPERTIES='{
    "format": "csv",
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "accessid":"accesskey_id",
    "accesskey":"accesskey_secret",
    "url":"oss://testBucketname/adb_data/",
    "delimiter": "\\1",
    "null_value": "\\\\N",
    "oss_ignore_quote_and_escape": "true",
    "ossnull": 2
}';
Note

Note the following points when you create an OSS external table in CSV format to read Hive TEXT files:

  • The default column delimiter for Hive TEXT files is \1. To read and write Hive TEXT files using an OSS external table in CSV format, you must escape the delimiter as \\1 when you configure the delimiter parameter.

  • The default NULL value for a Hive TEXT file is \N. To read or write Hive TEXT files using an external table in CSV format on OSS, you must set the null_value parameter and escape the value as \\\\N.

  • Other basic Hive data types, such as BOOLEAN, map directly to AnalyticDB for MySQL data types, but the BINARY, CHAR(n), and VARCHAR(n) types all map to the AnalyticDB for MySQL VARCHAR type.

Appendix: Data type mappings

Important
  • The data types that you specify when you create a table must be consistent with the mappings in the following tables. For the DECIMAL type, the precision must also be consistent.

  • External tables in Parquet format do not support the STRUCT type. If you use this type, the table creation fails.

  • External tables in ORC format do not support complex types such as LIST, STRUCT, or UNION. If you use these types, the table creation fails. You can create an ORC external table that contains a column of the MAP type, but queries on the table will fail.

Parquet files and AnalyticDB for MySQL data type mapping

Parquet primitive data type

Parquet logicalType

AnalyticDB for MySQL data type

BOOLEAN

None

BOOLEAN

INT32

INT_8

TINYINT

INT32

INT_16

SMALLINT

INT32

None

INT or INTEGER

INT64

None

BIGINT

FLOAT

None

FLOAT

DOUBLE

None

DOUBLE

  • FIXED_LEN_BYTE_ARRAY

  • BINARY

  • INT64

  • INT32

DECIMAL

DECIMAL

BINARY

UTF-8

  • VARCHAR

  • STRING

  • JSON (if the content of the Parquet column is in JSON format)

INT32

DATE

DATE

INT64

TIMESTAMP_MILLIS

TIMESTAMP or DATETIME

INT96

None

TIMESTAMP or DATETIME

ORC files and AnalyticDB for MySQL data type mapping

ORC file data type

AnalyticDB for MySQL data type

BOOLEAN

BOOLEAN

BYTE

TINYINT

SHORT

SMALLINT

INT

INT or INTEGER

LONG

BIGINT

DECIMAL

DECIMAL

FLOAT

FLOAT

DOUBLE

DOUBLE

  • BINARY

  • STRING

  • VARCHAR

  • VARCHAR

  • STRING

  • JSON (if the content of the ORC column is in JSON format)

TIMESTAMP

TIMESTAMP or DATETIME

DATE

DATE

Paimon files and AnalyticDB for MySQL data type mapping

Paimon file data type

AnalyticDB for MySQL data type

CHAR

VARCHAR

VARCHAR

VARCHAR

BOOLEAN

BOOLEAN

BINARY

VARBINARY

VARBINARY

VARBINARY

DECIMAL

DECIMAL

TINYINT

TINYINT

SMALLINT

SMALLINT

INT

INTEGER

BIGINT

BIGINT

FLOAT

REAL

DOUBLE

DOUBLE

DATE

DATE

TIME

Not supported

TIMESTAMP

TIMESTAMP

LocalZonedTIMESTAMP

TIMESTAMP (local time zone information is ignored)

ARRAY

ARRAY

MAP

MAP

ROW

ROW