This topic describes how to query Object Storage Service (OSS) objects by using the external mapping tables of AnalyticDB for MySQL and how to import objects from OSS to AnalyticDB for MySQL. OSS objects in the Parquet, CSV, or TEXT format can be imported to AnalyticDB for MySQL by using external tables.

Prerequisites

  • A directory is created in OSS by performing the following operations to store AnalyticDB for MySQL data:
    1. Activate OSS
      Note Make sure that OSS and AnalyticDB for MySQL are deployed in the same region.
    2. Create buckets
    3. Create directories
    4. Upload an object. For more information, see Upload objects.
      In this example, the oss_import_test_data.txt object is uploaded to the <bucket-name>.oss-cn-hangzhou.aliyuncs.com/adb/ directory in OSS. The row delimiter is a line feed, and the column delimiter is a semicolon (;). The following code shows some of the data contained in this object:
      uid;other
      12;hello_world_1
      27;hello_world_2
      28;hello_world_3
      33;hello_world_4
      37;hello_world_5
      40;hello_world_6
      ...             
  • If the AnalyticDB MySQL cluster is in elastic mode, you must turn on ENI in the Network Information section of the Cluster Information page. Enable ENI

Procedure

In this example, data in the oss_import_test_data.txt object is imported to an AnalyticDB for MySQL database named adb_demo.

  1. Connect to the AnalyticDB MySQL cluster. For more information, see Connect to an AnalyticDB for MySQL cluster.
  2. Create a destination database. For more information, see Create a database.

    In this example, the database named adb_demo is used as the destination database in the AnalyticDB MySQL cluster.

  3. Create an external mapping table. You can use the CREATE TABLE statement to create an OSS external mapping table in the CSV, Parquet, or ORC format in the adb_demo database. For more information, see Syntax for creating an OSS external table.
  4. Query OSS data. If you need only to import data, skip this step.
    The syntax for querying external tables is the same as that for querying AnalyticDB for MySQL tables. You can use the following statement to query OSS external tables:
    SELECT uid, other FROM oss_import_test_external_table WHERE uid < 100 LIMIT 10;
    Note
    • If an object is in the CSV format and contains a large amount of data, we recommend that you import the object to AnalyticDB for MySQL before you query data. Otherwise, the query performance may be compromised.
    • If an object is in the Parquet or ORC format, the query performance is high. You can determine whether to import the object to AnalyticDB for MySQL before you query data.
  5. Create a destination table. You can create a destination table named adb_oss_import_test in the adb_demo database to store the data imported from OSS. You can use the following statement to create the destination table:
    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
        uid string,
        other string
    )
    DISTRIBUTED BY HASH(uid);
  6. Execute the INSERT statement to import data from OSS to AnalyticDB for MySQL.
    • Method 1: Execute the INSERT INTO statement to import data.
      INSERT INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 2: Execute the INSERT OVERWRITE INTO statement to import data.
      INSERT OVERWRITE INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 3: Asynchronously execute the INSERT OVERWRITE INTO statement to import data.
      SUBMIT job INSERT OVERWRITE INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;

      The following output is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For more information about how to submit asynchronous tasks, see Asynchronously submit an import or export task.

Syntax for creating an OSS external table

  • Create an OSS external table in the CSV format

    The following statement shows how to create an external table in the CSV format:

    CREATE TABLE IF NOT EXISTS oss_import_test_external_table
    (
        uid string,
        other string
    )
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "url":"oss://<bucket-name>/adb/oss_import_test_data.txt",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "delimiter":";",
        "skip_header_line_count":1
    }'; 
    Parameter Required Description
    ENGINE='OSS' Yes The storage engine used for the external table, which is OSS.
    TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
    endpoint The endpoint of the OSS bucket.
    Note AnalyticDB for MySQL can access OSS only from Elastic Compute Service (ECS) instances over virtual private clouds (VPCs).

    You can log on to the OSS console, find the bucket, and then view the endpoint on the Overview page.

    url The absolute URL of the source object or directory in OSS. We recommend that you use a forward slash (/) at the end of the URL of the directory.
    Examples:
    • Object: oss://<bucket-name>/adb/oss_import_test_data.txt.
    • Directory: oss://<bucket-name>/adb_data/.
    accessid The AccessKey ID used to access the object or directory in OSS.

    For information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

    accesskey The AccessKey secret used to access the object or directory in OSS.
    delimiter The column delimiter of the object in the CSV format. For example, you can set the column delimiter to a comma (,).
    null_value No The NULL value of the object in the CSV format. By default, an empty string is defined as NULL, which is "null_value": "".
    Note This parameter is supported only for AnalyticDB for MySQL clusters in V3.1.4.2 or later. For more information, see Release notes.
    ossnull The rule for defining the NULL value of the object in the CSV format. Default value: 1. Valid values:
    • 1: EMPTY_SEPARATORS. Only empty strings are defined as NULL.

      For example, a,"",,c is expressed as "a","",NULL,"c" based on this rule.

    • 2: EMPTY_QUOTES. Only quotation marks ("") are defined as NULL.

      For example, a,"",,c is expressed as "a",NULL,"","c" based on this rule.

    • 3: BOTH. Both empty strings and quotation marks ("") are defined as NULL.

      For example, a,"",,c is expressed as "a",NULL,NULL,"c" based on this rule.

    • 4: NEITHER. Empty strings and quotation marks ("") are not defined as NULL.

      For example, a,"",,c is expressed as "a","","","c" based on this rule.

    Note The preceding examples are provided on the premise of "null_value": "".
    skip_header_line_count The number of header rows to skip when you import data. The first row of a CSV object is the table header. If you set this parameter to 1, the first row of the object is skipped when you import data.

    The default value of this parameter is 0, which indicates that no rows are skipped.

    oss_ignore_quote_and_escape Specifies whether to ignore quotation marks (") and escape characters. The default value of this parameter is false, which indicates that quotation marks (") and escape characters are not ignored.
    Note This parameter is supported only for AnalyticDB for MySQL clusters in V3.1.4.2 or later. For more information, see Release notes.
    AnalyticDB MySQL allows you to read and write data of TEXT files in Hive by using OSS external tables in the CSV format. The following code provides an example of the statement that you can use to create an external table:
    CREATE TABLE adb_csv_hive_format_oss (
      a tinyint,
      b smallint,
      c int,
      d bigint,
      e boolean,
      f float,
      g double,
      h varchar,
      i varchar, -- binary
      j timestamp,
      k DECIMAL(10, 4),
      l varchar, -- char(10)
      m varchar, -- varchar(100)
      n date
    ) ENGINE = 'OSS' TABLE_PROPERTIES='{
        "format": "csv",
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "url":"oss://<bucket-name>/adb/adb_csv_hive_format_oss.txt",
        "delimiter": "\\1",
        "null_value": "\\\\N",
        "oss_ignore_quote_and_escape": "true",
        "ossnull": 2,
    }';
    Note When you create an OSS external table in the CSV format to read and write data of a TEXT file in Hive, take note of the following items:
    • The default column delimiter for the TEXT file in Hive is \1. If you want to use the OSS external table to read and write data of the TEXT file in Hive, \1 must be escaped to \\1 for the delimiter parameter.
    • By default, the NULL value of the TEXT file in Hive is \N. If you want to use the OSS external table to read and write data of the TEXT file in Hive, \N must be escaped to \\\\N for the null_value parameter.
    • The BINARY, CHAR(N), and VARCHAR(N) data types in Hive all correspond to the VARCHAR data type in AnalyticDB for MySQL. Other basic data types in Hive such as BOOLEAN are the same as those in AnalyticDB for MySQL.
  • Create an OSS external table in the Parquet or ORC format
    The following statement shows how to create an external table in the Parquet format:
    CREATE TABLE IF NOT EXISTS oss_import_test_external_table
    (
        uid string,
        other string
    )
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "url":"oss://<bucket-name>/adb/oss_import_test_data.txt",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "format":"parquet"
    }';
    Parameter Description
    ENGINE='OSS' The storage engine used for the external table, which is OSS.
    TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
    endpoint The endpoint of the OSS bucket.
    Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

    You can log on to the OSS console, find the bucket, and then view the endpoint on the Overview page.

    url The absolute URL of the source object or directory in OSS. We recommend that you use a forward slash (/) at the end of the URL of the directory.
    Examples:
    • Object: oss://<bucket-name>/adb/oss_import_test_data.txt.
    • Directory: oss://<bucket-name>/adb_data/.
    accessid The AccessKey ID used to access the object or directory in OSS.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

    accesskey The AccessKey secret used to access the object or directory in OSS.
    format The format of the object.
    • When you create an external table in the Parquet format, you must set this parameter to parquet.
    • When you create an external table in the ORC format, you must set this parameter to orc.
    Notice
    • The column names used in the statement to create an external table in the Parquet or ORC format must be the same as those in a Parquet or ORC file. Column names are not case sensitive. The sequence of the columns in the statement must also be the same as that in the Parquet or ORC file.
    • When you create an external table in the Parquet or ORC format, you can choose only specific columns in a Parquet or ORC file as the columns of the external table. Columns that are not selected in the Parquet or ORC file are not imported. If the statement used to create an external table contains a column that is not in the Parquet or ORC file, NULL is returned in the query results of this column.
    The following table describes the mappings between data types in Parquet and AnalyticDB for MySQL V3.0.
    Basic type in Parquet Logical type in Parquet Data type in AnalyticDB for MySQL V3.0
    BOOLEAN None BOOLEAN
    INT32 INT_8 TINYINT
    INT32 INT_16 SMALLINT
    INT32 None INT or INTEGER
    INT64 None BIGINT
    FLOAT None FLOAT
    DOUBLE None DOUBLE
    • FIXED_LEN_BYTE_ARRAY
    • BINARY
    • INT64
    • INT32
    DECIMAL DECIMAL
    BINARY UTF-8
    • VARCHAR
    • STRING
    • JSON (JSON is available if an object in the Parquet format contains a column in the JSON format.)
    INT32 DATE DATE
    INT64 TIMESTAMP_MILLIS TIMESTAMP or DATETIME
    INT96 None TIMESTAMP or DATETIME
    Notice External tables in the Parquet format do not support the STRUCT data type. Tables cannot be created for this data type.
    The following table describes the mappings between data types in ORC and AnalyticDB for MySQL V3.0.
    Data type in ORC Data type in AnalyticDB for MySQL V3.0
    BOOLEAN BOOLEAN
    BYTE TINYINT
    SHORT SMALLINT
    INT INT or INTEGER
    LONG BIGINT
    DECIMAL DECIMAL
    FLOAT FLOAT
    DOUBLE DOUBLE
    • BINARY
    • STRING
    • VARCHAR
    • VARCHAR
    • STRING
    • JSON (JSON is available if an object in the ORC format contains a column in the JSON format.)
    TIMESTAMP TIMESTAMP or DATETIME
    DATE DATE
    Notice External tables in the ORC format do not support the LIST, STRUCT, or UNION data type. Tables that contain columns in the MAP data type can be created, but queries cannot be performed.

Create an OSS external table for a partitioned object

A hierarchical directory is generated for OSS data that contains partitions. Example:
parquet_partition_classic/
├── p1=2020-01-01
│   ├── p2=4
│   │   ├── p3=SHANGHAI
│   │   │   ├── 000000_0
│   │   │   └── 000000_1
│   │   └── p3=SHENZHEN
│   │       └── 000000_0
│   └── p2=6
│       └── p3=SHENZHEN
│           └── 000000_0
├── p1=2020-01-02
│   └── p2=8
│       ├── p3=SHANGHAI
│       │   └── 000000_0
│       └── p3=SHENZHEN
│           └── 000000_0
└── p1=2020-01-03
    └── p2=6
        ├── p2=HANGZHOU
        └── p3=SHENZHEN
            └── 000000_0
In the preceding example, p1 indicates the first-level partition, p2 indicates the second-level partition, and p3 indicates the third-level partition. If you want to query the data by partition, you must specify partition columns in the statement used to create an OSS external table. The following statement shows how to create an OSS external table and specify partition columns in the external table. In this example, a Parquet object is used.
CREATE TABLE IF NOT EXISTS oss_parquet_partition_table
(
  uid varchar,
  other varchar,
  p1 date,
  p2 int,
  p3 varchar
)
ENGINE='OSS'
TABLE_PROPERTIES='{
  "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
  "url":"oss://<bucket-name>/adb/oss_parquet_data_dir",
  "accessid":"LTAIF****5FsE",
  "accesskey":"Ccw****iWjv",
  "format":"parquet",
  "partition_column":"p1, p2, p3"
}';
Note
  • The partition_column property of TABLE_PROPERTIES must declare the partition columns (such as p1, p2, and p3 in the example). The sequence of partition columns specified by partition_column must be the same as the sequence of partition columns defined in the statement used to create the external table.
  • When you define partition columns in an external table, you must specify the columns that contain partitions (such as p1, p2, and p3 in the example) and their data types. You must specify the partition columns at the end of the statement.
  • The sequence of partition columns defined in the statement used to create an external table must be the same as the sequence of partition columns specified by the partition_column property.
  • Partition columns support the following data types: BOOLEAN, TINYINT, SMALLINT, INT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, STRING, DATE, and TIMESTAMP.
  • The syntax used to query partition columns and the way in which the query results are displayed are the same as those for other columns.