This topic describes how to query Object Storage Service (OSS) objects by using the external mapping table of AnalyticDB for MySQL and how to import objects from OSS to AnalyticDB for MySQL. The Parquet, CSV, and TEXT data formats are supported in OSS.

Prerequisites

  • A directory in which to store AnalyticDB for MySQL data in OSS is created by performing the following operations:
    1. Activate OSS
      Note Make sure that OSS and AnalyticDB for MySQL are deployed in the same region.
    2. Create buckets
    3. Create folders
    4. Upload objects

      In this example, the oss_import_test_data.txt object is uploaded to the bucket-name.oss-cn-hangzhou.aliyuncs.com/adb/ directory in OSS. The row delimiter is a line feed, and the column delimiter is a semicolon (;). The following sample data is in the uploaded object:

      number;note
      0001;hello_world_1
      0002;hello_world_2
      0003;hello_world_3
      0004;hello_world_4
      0005;hello_world_5
      0006;hello_world_6
      ...                           
  • An AnalyticDB for MySQL cluster is created, a whitelist is configured, and an account and a database are created. For more information, see Create a cluster.
Note If AnalyticDB for MySQL cluster is in elastic mode, log on to the AnalyticDB for MySQL console and view the cluster information. In the Network Information section, enable elastic network interface (ENI).

Procedure

In this example, the data in the oss_import_test_data.txt object is imported to the AnalyticDB for MySQL database named adb_demo.

  1. Connect to an AnalyticDB for MySQL cluster and the database. For more information, see Connect to a cluster.
  2. Execute the CREATE TABLE statement to create an external table in the adb_demo database. For more information about the syntax for creating an OSS external table for an object whose data type is CSV, Parquet, or TEXT, see Syntax for creating an OSS external table.
  3. Query OSS data. Skip this step if you need only to import data.

    The syntax for querying external tables is the same as that for querying AnalyticDB for MySQL tables. You can use the following sample code to query the OSS external table:

    • If an object whose data type is CSV or TEXT contains a large amount of data, we recommend that you import the object into AnalyticDB for MySQL before you query the data. Otherwise, the query performance may be compromised.
    • The query performance is high if the data type of an object is Parquet. You can determine whether to import the object into AnalyticDB for MySQL before you query the data.
    select uid, other from oss_parquet_external_table where uid < 100 limit 10
  4. Execute the CREATE TABLE statement to create a destination table named adb_oss_import_test in the adb_demo database to store the data imported from OSS.
    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
        uid string,
        other string
    )
    DISTRIBUTED BY HASH(uid)                  
  5. Execute the INSERT statement to import data from OSS to AnalyticDB for MySQL.
    • Execute the INSERT INTO statement to import data.
      insert into adb_oss_import_test
      select * from oss_import_test_external_table                   
    • Execute the INSERT OVERWRITE INTO statement to import data.
      insert overwrite into adb_oss_import_test
      select * from oss_import_test_external_table                           
    • Execute the INSERT OVERWRITE INTO statement to asynchronously import data.
      submit job insert overwrite into adb_oss_import_test
      select * from oss_import_test_external_table ;

      The following output is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303000715 |

      For more information about how to submit asynchronous tasks, see Submit import or export tasks asynchronously.

Syntax for creating an OSS external table

Create an OSS external table for an object whose data type is CSV

The data type of the oss_import_test_data.txt object is CSV. The following section describes the syntax for creating an OSS external table for an object whose data type is CSV:

CREATE TABLE IF NOT EXISTS oss_import_test_external_table
(
    uid string,
    other string
)
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "url":"oss://$bucketname/adb/oss_import_test_data.txt",
    "accessid":"LTAIF****5FsE",
    "accesskey":"Ccw****iWjv",
    "delimiter":";",
    "skip_header_line_count":1
}'            
Parameter Description
ENGINE='OSS' The storage engine used for the external table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

url The absolute address of the folder or object in OSS. The URL of the folder must end with a forward slash (/).

Example: object: oss://$Bucket name/adb/oss_import_test_data.txt

Folder: oss://$ Bucket name /adb_data/

accessid The AccessKey ID used to access the object or the folder in OSS.

For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object or the folder in OSS.
delimiter The column delimiter of the object whose data type is CSV.
(Optional) ossnull The display mode of the NULL value. Default value: EMPTY_SEPARATORS. Valid values:
  • EMPTY_SEPARATORS

    a,"",,c --> "a","",NULL,"c"

  • EMPTY_QUOTES

    a,"",,c --> "a",NULL,"","c"

  • BOTH

    a,"",,c --> "a",NULL,NULL,"c"

  • NEITHER

    a,"",,c --> "a","","","c"

(Optional) skip_header_line_count Specifies whether to skip the first rows. By default, the parameter is set to 0, and no rows are skipped. If the parameter is set to 1 or an integer greater than 1, the corresponding number of rows from the beginning of the table is skipped. The first row is the table header if the data type of an object is CSV.
Create an OSS external table for an object whose data type is Parquet
CREATE TABLE IF NOT EXISTS oss_parquet_external_table
(
    uid string,
    other string
)
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "url":"oss://****",
    "accessid":"LTAIF****5FsE",
    "accesskey":"Ccw****iWjv",
    "format":"parquet"
}'
Parameter Description
ENGINE='OSS' The storage engine used for the external table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

url The absolute address of the folder or object in OSS. The URL of the folder must end with a forward slash (/).

Example: object: oss://$Bucket name/adb/oss_import_test_data.txt

Folder: oss://$ Bucket name /adb_data/

accessid The AccessKey ID used to access the object or the folder in OSS.

For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object or the folder in OSS.
format The data type of the object. When you create an external table for an object whose data type is Parquet, you must set the parameter to parquet.

When you create an external table for an object whose data type is Parquet, take note of the following mappings between data types.

Basic type of Parquet Logical type of Parquet Corresponding type in AnalyticDB for MySQL
BOOLEAN None boolean
INT32 INT_8 tinyint
INT32 INT_16 smallint
INT32 None INT or INTEGER
INT64 None bigint
FLOAT None float
DOUBLE None double
  • FIXED_LEN_BYTE_ARRAY
  • BINARY
  • INT64
  • INT32
DECIMAL decimal
BINARY UTF-8
  • varchar
  • string
  • JSON (JSON is available if an object whose data type is Parquet contains a column in the JSON format.)
INT32 DATE date
INT64 TIMESTAMP_MILLIS TIMESTAMP or DATETIME
INT96 None TIMESTAMP or DATETIME
Notice
  • Column names in the external table must be the same as those in the object whose data type is Parquet. Column names are case-insensitive and can be in different orders. However, we recommend that you keep the columns in the external table in the same order as they are in the object.
  • You can define only part of columns of the object whose data type is Parquet in the external table. Columns of the object that are not defined in the external table are ignored. If a column defined in the external table does not exist in the object, NULL is returned when the column is queried.
Create an OSS external table for the object whose data type is TEXT
CREATE TABLE IF NOT EXISTS oss_text_external_table
(
    uid string,
    other string
) 
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", 
    "accessid":"LTAIF****5FsE", 
    "accesskey":"Ccw****iWjv", 
    "format":"text", 
    "row_delimiter":"\n",
    "field_delimiter":"\n",
    "URL":"oss://****"
}';
Parameter Description
ENGINE='OSS' The storage engine used for the external table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.

You can log on to the OSS console, click the required bucket, and then view the endpoint on the Overview page.

url The absolute address of the object in OSS.

Example: oss://$Bucket name/adb/oss_import_test_data.txt

accessid The AccessKey ID used to access the object in OSS.

For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object in OSS.
format The data type of the object. When you create an external table for an object whose data type is TEXT, you must set the parameter to text.
row_delimiter The row delimiter of the object whose data type is TEXT. Set the value to \n.
field_delimiter The column delimiter of the object whose data type is TEXT. The column delimiter can be only a single character. A value of \n indicates that a row is a field.

Create an OSS external table for a partitioned object whose data type is Parquet or CSV

Part of OSS data sources have partitions that form a hierarchical directory as shown in the following sample code:

parquet_partition_classic/
├── p1=2020-01-01
│   ├── p2=4
│   │   ├── p3=SHANGHAI
│   │   │   ├── 000000_0
│   │   │   └── 000000_1
│   │   └── p3=SHENZHEN
│   │       └── 000000_0
│   └── p2=6
│       └── p3=SHENZHEN
│           └── 000000_0
├── p1=2020-01-02
│   └── p2=8
│       ├── p3=SHANGHAI
│       │   └── 000000_0
│       └── p3=SHENZHEN
│           └── 000000_0
└── p1=2020-01-03
    └── p2=6
        ├── p2=HANGZHOU
        └── p3=SHENZHEN
            └── 000000_0

In the preceding sample code, p1 indicates a level-1 partition, p2 indicates a level-2 partition, and p3 indicates a level-3 partition. If you want to query data by partition, you must specify partition key columns when you create OSS external tables. The following code provides an example on how to create an OSS partitioned external table. Parquet is used in this example. The data type of partitions can also be CSV.

CREATE TABLE IF NOT EXISTS oss_parquet_partition_table
(
  uid varchar,
  other varchar,
  p1 date,
  p2 int,
  p3 varchar
)
ENGINE='OSS'
TABLE_PROPERTIES='{
  "endpoint":"oss-xxxx.aliyuncs.com",
  "url":"oss://****/****/oss_parquet_data_dir",
  "accessid":"****",
  "accesskey":"****",
  "format":"parquet",
  "partition_column":"p1, p2, p3"
}'
Note
  • As shown in the preceding code, you must specify p1, p2, and p3 and their data types. You must also specify that p1, p2, and p3 are partition key columns in the partition_column field of the TABLE_PROPERTIES section. In the partition_column field, the order of p1, p2, and p3 cannot be changed. Partition key columns must also be in the same order in the column definition and placed at the end of the column definition list.
  • Partition key columns support the following data types: BOOLEAN, TINYINT, SMALLINT, INT or INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR or STRING, DATE, and TIMESTAMP.
  • No differences exist in performance or usage during a query between the partition key columns and other data columns.