This topic describes how to query Object Storage Service (OSS) objects by using the external mapping table of AnalyticDB for MySQL and how to import objects from OSS to AnalyticDB for MySQL. OSS objects can be in the Parquet, CSV, or TEXT format.

Prerequisites

  • A directory to store AnalyticDB for MySQL data is created in OSS by performing the following operations:
    1. Activate OSS
      Note Make sure that OSS and AnalyticDB for MySQL are deployed in the same region.
    2. Create buckets
    3. Create directories
    4. Upload objects

      In this example, the oss_import_test_data.txt object is uploaded to the bucket-name.oss-cn-hangzhou.aliyuncs.com/adb/ directory in OSS. The row delimiter is a line feed, and the column delimiter is a semicolon (;). The following sample data is in the uploaded object:

      number;note
      0001;hello_world_1
      0002;hello_world_2
      0003;hello_world_3
      0004;hello_world_4
      0005;hello_world_5
      0006;hello_world_6
      ...                           
  • If the AnalyticDB for MySQL cluster is in elastic mode, you must turn on ENI in the Network Information section of the Cluster Information page. Enable ENI

Procedure

In this example, the data in the oss_import_test_data.txt object is imported to the AnalyticDB for MySQL database named adb_demo.

  1. Connect to the AnalyticDB for MySQL cluster. For more information, see Connect to an AnalyticDB for MySQL cluster.
  2. Create a destination database. For more information, see Create a database.

    In this example, the database named adb_demo is used as the destination database in the AnalyticDB for MySQL cluster.

  3. Create an external mapping table. Execute the CREATE TABLE statement to create an OSS external table for an object whose file format is CSV, Parquet, or TEXT in the adb_demo database. For more information, see Syntax for creating an OSS external table.
  4. Query OSS data. Skip this step if you need only to import data.
    The syntax for querying external tables is the same as that for querying AnalyticDB for MySQL tables. The following statement is used to query the OSS external table:
    SELECT uid, other FROM oss_parquet_external_table WHERE uid < 100 LIMIT 10;
    Note
    • If an object whose file format is CSV or TEXT contains a large amount of data, we recommend that you import the object into AnalyticDB for MySQL before you query the data. Otherwise, the query performance may be compromised.
    • The query performance is high if the file format of an object is Parquet. You can determine whether to import the object into AnalyticDB for MySQL before you query the data.
  5. Create a destination table. Create a destination table named adb_oss_import_test in the adb_demo database to store the data imported from OSS. The following statement is used to create the destination table:
    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
        uid string,
        other string
    )
    DISTRIBUTED BY HASH(uid);
  6. Execute the INSERT statement to import data from OSS to AnalyticDB for MySQL.
    • Execute the INSERT INTO statement to import data.
      INSERT INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Execute the INSERT OVERWRITE INTO statement to import data.
      INSERT OVERWRITE INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Execute the INSERT OVERWRITE INTO statement to asynchronously import data.
      SUBMIT job INSERT OVERWRITE INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;

      The following output is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For more information about how to submit asynchronous tasks, see Asynchronously submit an import or export task.

Syntax for creating an OSS external table

  • Create an OSS external table for an object whose file format is CSV

    Syntax:

    CREATE TABLE IF NOT EXISTS oss_import_test_external_table
    (
        uid string,
        other string
    )
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "url":"oss://$bucketname/adb/oss_import_test_data.txt",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "delimiter":";",
        "skip_header_line_count":1
    }'; 
    Parameter Required Description
    ENGINE='OSS' Yes The storage engine used for the external table, which is OSS.
    TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
    endpoint The endpoint of the OSS bucket.
    Note AnalyticDB for MySQL can access OSS only from Elastic Compute Service (ECS) instances over virtual private clouds (VPCs).

    You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

    url The absolute address of the source object or folder in OSS. We recommend that you use a forward slash (/) at the end of the URL of the folder.
    Example:
    • Object: oss://$Bucket name/adb/oss_import_test_data.txt.
    • Folder: oss://$Bucket name/adb_data/.
    accessid The AccessKey ID used to access the object or the folder in OSS.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

    accesskey The AccessKey secret used to access the object or the folder in OSS.
    delimiter The column delimiter of the object whose file format is CSV. For example, you can set the column delimiter to a comma (,).
    null_value No The NULL value of the object whose file format is CSV. By default, an empty string is defined as NULL, which is "null_value": "".
    Note This parameter is supported only for AnalyticDB for MySQL clusters V3.1.4.2 or later. For more information, see Release notes.
    ossnull The corresponding rule for the NULL value in the object whose file format is CSV. Default value: 1. Valid values:
    • 1: indicates EMPTY_SEPARATORS. Only empty strings are defined as NULL.

      Example: a,"",,c --> "a","",NULL,"c"

    • 2: indicates EMPTY_QUOTES. Only quotation marks ("") are defined as NULL.

      Example: a,"",,c --> "a",NULL,"","c".

    • 3: indicates BOTH. Both empty strings and quotation marks ("") are defined as NULL.

      Example: a,"",,c --> "a",NULL,NULL,"c".

    • 4: indicates NEITHER. Empty strings and quotation marks ("") are not defined as NULL.

      Example: a,"",,c --> "a","","","c".

    Note The preceding examples are provided on the premise of "null_value": "".
    skip_header_line_count The number of rows from the beginning of the table to skip when you import data. The first row is the table header if the file format of an object is CSV. If this parameter is set to 1, the table header is skipped when you import data.

    By default, this parameter is set to 0, and no rows are skipped.

    oss_ignore_quote_and_escape Specifies whether to ignore quotation marks (") and escape characters. The default value is false, which indicates that quotation marks (") and escape characters are not ignored.
    Note This parameter is supported only for AnalyticDB for MySQL clusters V3.1.4.2 or later. For more information, see Release notes.
    AnalyticDB for MySQL allows you to read and write data of Hive TEXT objects by using OSS external tables for objects whose file formats are CSV. The following statement is used to create an OSS external table:
    CREATE TABLE adb_csv_hive_format_oss (
      a tinyint,
      b smallint,
      c int,
      d bigint,
      e boolean,
      f float,
      g double,
      h varchar,
      i varchar, -- binary
      j timestamp,
      k DECIMAL(10, 4),
      l varchar, -- char(10)
      m varchar, -- varchar(100)
      n date
    ) ENGINE = 'OSS' TABLE_PROPERTIES='{
        "format": "csv",
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "url":"oss://$bucketname/adb/adb_csv_hive_format_oss.txt",
        "delimiter": "\\1",
        "null_value": "\\\\N",
        "oss_ignore_quote_and_escape": "true",
        "ossnull": 2,
    }';
    Note When you create an OSS external table for an object whose file format is CSV to read and write data of a Hive TEXT file, take note of the following items:
    • The default column delimiter for the Hive TEXT object is \1. If you want to use the OSS external table to read and write data of the Hive TEXT object, \1 must be escaped to \\1 for the delimiter parameter.
    • By default, the NULL value of the Hive TEXT object is \N. If you want to use the OSS external table to read and write data of the Hive TEXT object, \N must be escaped to \\\\N for the null_value parameter.
    • The BINARY, CHAR(N), and VARCHAR(N) data types in Hive all correspond to the VARCHAR data type in AnalyticDB for MySQL. Other basic data types in Hive, such as BOOLEAN, are the same to those in AnalyticDB for MySQL.
  • Create an OSS external table for an object whose file format is Parquet

    Syntax:

    CREATE TABLE IF NOT EXISTS oss_parquet_external_table
    (
        uid string,
        other string
    )
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
        "url":"oss://****",
        "accessid":"LTAIF****5FsE",
        "accesskey":"Ccw****iWjv",
        "format":"parquet"
    }'
    Parameter Description
    ENGINE='OSS' The storage engine used for the external table, which is OSS.
    TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
    endpoint The endpoint of the OSS bucket.
    Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

    You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

    url The absolute address of the source object or folder in OSS. We recommend that you use a forward slash (/) at the end of the URL of the folder.
    Example:
    • Object: oss://$Bucket name/adb/oss_import_test_data.txt.
    • Folder: oss://$Bucket name/adb_data/.
    accessid The AccessKey ID used to access the object or the folder in OSS.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

    accesskey The AccessKey secret used to access the object or the folder in OSS.
    format The file format of the object. When you create an external table for an object whose file format is Parquet, you must set the parameter to parquet.

    When you create an external table for an object whose file format is Parquet, take note of the following mappings between data types.

    Table 1. Data type mapping
    Basic type of Parquet Logical type of Parquet Corresponding type in AnalyticDB for MySQL
    BOOLEAN None BOOLEAN
    INT32 INT_8 TINYINT
    INT32 INT_16 SMALLINT
    INT32 None INT or INTEGER
    INT64 None BIGINT
    FLOAT None FLOAT
    DOUBLE None DOUBLE
    • FIXED_LEN_BYTE_ARRAY
    • BINARY
    • INT64
    • INT32
    DECIMAL DECIMAL
    BINARY UTF-8
    • VARCHAR
    • STRING
    • JSON (JSON is available if an object whose file format is Parquet contains a column in the JSON format.)
    INT32 DATE DATE
    INT64 TIMESTAMP_MILLIS TIMESTAMP or DATETIME
    INT96 None TIMESTAMP or DATETIME
    Notice
    • Column names in the external table must be the same as those in the object whose file format is Parquet. Column names are case-insensitive and can be in different orders. However, we recommend that you keep the columns in the external table in the same order as they are in the object.
    • You can define only part of columns of the object whose file format is Parquet in the external table. Columns of the object that are not defined in the external table are ignored. If a column defined in the external table does not exist in the object, NULL is returned when the column is queried.
  • Create an OSS external table for the object whose file format is TEXT

    Syntax:

    CREATE TABLE IF NOT EXISTS oss_text_external_table
    (
        uid string,
        other string
    ) 
    ENGINE='OSS'
    TABLE_PROPERTIES='{
        "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", 
        "accessid":"LTAIF****5FsE", 
        "accesskey":"Ccw****iWjv", 
        "format":"text", 
        "row_delimiter":"\n",
        "field_delimiter":"\n",
        "url":"oss://****"
    }';
    Parameter Description
    ENGINE='OSS' The storage engine used for the external table, which is OSS.
    TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
    endpoint The endpoint of the OSS bucket.
    Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

    You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

    url The absolute address of the source object in OSS.

    Example: oss://$Bucket name/adb/oss_import_test_data.txt.

    accessid The AccessKey ID used to access the object in OSS.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

    accesskey The AccessKey secret used to access the object in OSS.
    format The file format of the object. When you create an external table for an object whose file format is TEXT, you must set the parameter to text.
    row_delimiter The row delimiter of the object whose file format is TEXT. Set the value to \n.
    field_delimiter The column delimiter of the object whose file format is TEXT. The column delimiter can be only a single character. A value of \n indicates that a row is a field.

Create an OSS external table for a partitioned object whose file format is Parquet or CSV

Part of OSS data sources have partitions that form a hierarchical directory. Example:

parquet_partition_classic/
├── p1=2020-01-01
│   ├── p2=4
│   │   ├── p3=SHANGHAI
│   │   │   ├── 000000_0
│   │   │   └── 000000_1
│   │   └── p3=SHENZHEN
│   │       └── 000000_0
│   └── p2=6
│       └── p3=SHENZHEN
│           └── 000000_0
├── p1=2020-01-02
│   └── p2=8
│       ├── p3=SHANGHAI
│       │   └── 000000_0
│       └── p3=SHENZHEN
│           └── 000000_0
└── p1=2020-01-03
    └── p2=6
        ├── p2=HANGZHOU
        └── p3=SHENZHEN
            └── 000000_0

In the preceding sample code, p1 indicates a level-1 partition, p2 indicates a level-2 partition, and p3 indicates a level-3 partition. If you want to query data by partition, you must specify partition key columns when you create OSS external tables. The following code provides an example on how to create an OSS partitioned external table. Parquet is used in this example. CSV is also supported.

CREATE TABLE IF NOT EXISTS oss_parquet_partition_table
(
  uid varchar,
  other varchar,
  p1 date,
  p2 int,
  p3 varchar
)
ENGINE='OSS'
TABLE_PROPERTIES='{
  "endpoint":"oss-xxxx.aliyuncs.com",
  "url":"oss://****/****/oss_parquet_data_dir",
  "accessid":"****",
  "accesskey":"****",
  "format":"parquet",
  "partition_column":"p1, p2, p3"
}'
Note
  • The preceding code shows that you must specify p1, p2, and p3 and their data types. You must also specify that p1, p2, and p3 are partition key columns in the partition_column field of the TABLE_PROPERTIES section. In the partition_column field, the order of p1, p2, and p3 cannot be changed. Partition key columns must also be in the same order in the column definition and placed at the end of the column definition list.
  • Partition key columns support the following data types: BOOLEAN, TINYINT, SMALLINT, INT or INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, STRING, DATE, and TIMESTAMP.
  • No differences exist in performance or usage during a query between the partition key columns and other data columns.