AnalyticDB for MySQL provides a variety of solutions for you to synchronize data in different scenarios. This topic describes how to import objects from Object Storage Service (OSS) to the AnalyticDB for MySQL database named adb_demo and query data in AnalyticDB for MySQL. For more information about how to import data into AnalyticDB for MySQL, see Synchronize data.

Prerequisites

  • A directory in which to store AnalyticDB for MySQL data in OSS is created by performing the following operations:
    1. Activate OSS
      Note Make sure that OSS and AnalyticDB for MySQL are deployed in the same region.
    2. Create buckets
    3. Create folders
    4. Upload test objects.

      In this example, the oss_import_test_data.txt object is uploaded to the bucket-name.oss-cn-hangzhou.aliyuncs.com/adb/ directory in OSS. The row delimiter is a line feed, and the column delimiter is a semicolon (;). The following sample data is in the uploaded object:

      number;note
      0001;hello_world_1
      0002;hello_world_2
      0003;hello_world_3
      0004;hello_world_4
      0005;hello_world_5
      0006;hello_world_6
      ...                           
  • The preparations for using AnalyticDB for MySQL are complete. For example, a cluster is created and a whitelist is configured. An account and a database are created. For more information, see Quick start of AnalyticDB for MySQL.

Procedure

  1. Execute the CREATE TABLE statement to create a foreign table in the adb_demo database. For more information about the syntax for creating an OSS foreign table for an object whose data type is CSV, Parquet, or TEXT, see Syntax for creating an OSS foreign table.
  2. Query OSS data.

    The syntax for querying foreign tables is the same as that for querying AnalyticDB for MySQL tables. You can use the following sample code to query the OSS foreign table:

    select uid, other from oss_parquet_external_table where uid < 100 limit 10
    • If an object whose data type is CSV or TEXT contains a large amount of data, we recommend that you import the object into AnalyticDB for MySQL before you query the data. Otherwise, the query performance may be compromised.
    • The query performance is high if the data type of an object is Parquet. You can determine whether to import the object into AnalyticDB for MySQL before you query the data.
  3. Execute the CREATE TABLE statement to create a destination table named adb_oss_import_test in the adb_demo database to store the data imported from OSS. For more information, see CREATE TABLE.
    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
        uid string,
        other string
    )
    DISTRIBUTED BY HASH(uid)                  
  4. Execute the INSERT statement to import data from OSS to AnalyticDB for MySQL.
    • Execute the INSERT INTO statement to import data.
      insert into adb_oss_import_test
      select * from oss_import_test_external_table                   
    • Execute the INSERT OVERWRITE INTO statement to import data.
      insert overwrite into adb_oss_import_test
      select * from oss_import_test_external_table                           
    • Execute the INSERT OVERWRITE INTO statement to asynchronously import data.
      submit job insert overwrite into adb_oss_import_test
      select * from oss_import_test_external_table ;
      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303000715 |

      For more information about how to submit asynchronous tasks, see Submit import or export tasks asynchronously.

Syntax for creating an OSS foreign table

Create an OSS foreign table for an object whose data type is CSV

The data type of the oss_import_test_data.txt object is CSV. The following section describes the syntax for creating an OSS foreign table for an object whose data type is CSV:

CREATE TABLE IF NOT EXISTS oss_import_test_external_table
(
    uid string,
    other string
)
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "url":"oss://$bucketname/adb/oss_import_test_data.txt",
    "accessid":"LTAIF****5FsE",
    "accesskey":"Ccw****iWjv",
    "delimiter":";",
    "skip_header_line_count":1
}'            
Parameter Description
ENGINE='OSS' The engine used for the foreign table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over the VPCs.

You can log on to the OSS console, click a bucket, and then view the endpoint on the Overview page.

url The absolute address of the folder or object in OSS.

Example: oss://$Bucketname/adb/oss_import_test_data.txt.

accessid The AccessKey ID used to access the object in OSS.

For more information about how to obtain your AccessKey ID and AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object in OSS.
delimiter The column delimiter of the object whose data type is CSV.
ossnull (optional) The display mode of the NULL value. Default value: EMPTY_SEPARATORS. Valid values:
  • EMPTY_SEPARATORS

    a,"",,c --> "a","",NULL,"c"

  • EMPTY_QUOTES

    a,"",,c --> "a",NULL,"","c"

  • BOTH

    a,"",,c --> "a",NULL,NULL,"c"

  • NEITHER

    a,"",,c --> "a","","","c"

skip_header_line_count (optional) Specifies whether to skip the first rows. By default, the parameter is set to 0, and no rows are skipped. If the parameter is set to 1 or an integer greater than 1, the corresponding number of rows from the beginning of the table are skipped. The first row is the table header if the data type of an object is CSV.
Create an OSS foreign table for an object whose data type is Parquet
CREATE TABLE IF NOT EXISTS oss_parquet_external_table
(
    uid string,
    other string
)
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "url":"oss://****",
    "accessid":"LTAIF****5FsE",
    "accesskey":"Ccw****iWjv",
    "format":"parquet"
}'
Parameter Description
ENGINE='OSS' The engine used for the foreign table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over the VPCs.

You can log on to the OSS console, click a bucket, and then view the endpoint on the Overview page.

url The absolute address of the folder or object in OSS.

Example: oss://$Bucketname/adb/oss_import_test_data.txt.

accessid The AccessKey ID used to access the object in OSS.

For more information about how to obtain your AccessKey ID and AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object in OSS.
format The data type of the object. When you create a foreign table for an object whose data type is Parquet, you must set the parameter to parquet.

When you create a foreign table for an object whose data type is Parquet, take note of the following mappings between data types.

Basic type of Parquet Logical type of Parquet Corresponding type in AnalyticDB for MySQL
BOOLEAN N/A BOOLEAN
INT32 INT_8 TINYINT
INT32 INT_16 SMALLINT
INT32 N/A INT or INTEGER
INT64 N/A BIGINT
FLOAT N/A FLOAT
DOUBLE N/A DOUBLE
  • FIXED_LEN_BYTE_ARRAY
  • BINARY
  • INT64
  • INT32
DECIMAL DECIMAL
BINARY UTF-8
  • VARCHAR
  • STRING
  • JSON (JSON is available if an object whose data type is Parquet contains a column in the JSON format.)
INT32 DATE DATE
INT64 TIMESTAMP_MILLIS TIMESTAMP or DATETIME
INT96 N/A TIMESTAMP or DATETIME
Notice
  • Column names in the foreign table must be the same as those in the object whose data type is Parquet. Column names are not case-sensitive and can be in different orders. However, we recommend that you keep the columns in the foreign table in the same order as they are in the object.
  • You can define only some columns of the object whose data type is Parquet in the foreign table. Columns of the object that are not defined in the foreign table are ignored. If a column defined in the foreign table does not exist in the object, NULL is returned when the column is queried.
Create an OSS foreign table for the object whose data type is TEXT
CREATE TABLE IF NOT EXISTS oss_text_external_table
(
    uid string,
    other string
) 
engine='oss' TABLE_PROPERTIES='{
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", 
    "accessid":"LTAIF****5FsE", 
    "accesskey":"Ccw****iWjv", 
    "format":"text", 
    "row_delimiter":"\n",
    "field_delimiter":"\n",
    "URL":"oss://****"
}';
Parameter Description
ENGINE='OSS' The engine used for the foreign table, which is OSS.
TABLE_PROPERTIES The connection information used by AnalyticDB for MySQL to access OSS.
endpoint The endpoint of the OSS bucket.
Note AnalyticDB for MySQL can access OSS only from ECS instances over the VPCs.

You can log on to the OSS console, click the target bucket, and then view the endpoint on the Overview page.

url The absolute address of the folder or object in OSS.

Example: oss://$Bucketname/adb/oss_import_test_data.txt.

accessid The AccessKey ID used to access the object in OSS.

For more information about how to obtain your AccessKey ID and AccessKey secret, see Obtain the AccessKey pair of an account.

accesskey The AccessKey secret used to access the object in OSS.
format The data type of the object. When you create a foreign table for an object whose data type is TEXT, you must set the parameter to text.
row_delimiter The row delimiter of the object whose data type is TEXT. Set the value to \n.
field_delimiter The column delimiter of the object whose data type is TEXT. The column delimiter can be only a single character. A value of \n indicates that a row is a field.

Create an OSS foreign table for a partitioned object whose data type is Parquet or CSV

Some OSS data sources have partitions that form a hierarchical directory as shown in the following sample code:

parquet_partition_classic/
├── p1=2020-01-01
│   ├── p2=4
│   │   ├── p3=SHANGHAI
│   │   │   ├── 000000_0
│   │   │   └── 000000_1
│   │   └── p3=SHENZHEN
│   │       └── 000000_0
│   └── p2=6
│       └── p3=SHENZHEN
│           └── 000000_0
├── p1=2020-01-02
│   └── p2=8
│       ├── p3=SHANGHAI
│       │   └── 000000_0
│       └── p3=SHENZHEN
│           └── 000000_0
└── p1=2020-01-03
    └── p2=6
        ├── p2=HANGZHOU
        └── p3=SHENZHEN
            └── 000000_0

In the preceding sample code, p1 indicates a level-1 partition, p2 indicates a level-2 partition, and p3 indicates a level-3 partition. If you want to query data by partition, you must specify partition key columns when you create OSS foreign tables. The following code provides an example on how to create an OSS partitioned foreign table. Parquet is used in this example. The data type of partitions can also be CSV.

CREATE TABLE IF NOT EXISTS oss_parquet_partition_table
(
  uid varchar,
  other varchar,
  p1 date,
  p2 int,
  p3 varchar
)
ENGINE='OSS'
TABLE_PROPERTIES='{
  "endpoint":"oss-xxxx.aliyuncs.com",
  "url":"oss://****/****/oss_parquet_data_dir",
  "accessid":"****",
  "accesskey":"****",
  "format":"parquet",
  "partition_column":"p1, p2, p3"
}'
Note
  • As shown in the preceding code, you must specify p1, p2, and p3 and their data types. You must also specify that p1, p2, and p3 are partition key columns in the partition_column field of the TABLE_PROPERTIES section. In the partition_column field, the order of p1, p2, and p3 cannot be changed. Partition key columns must also be in the same order in the column definition and placed at the end of the column definition list.
  • Partition key columns support the following data types: BOOLEAN, TINYINT, SMALLINT, INT or INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR or STRING, DATE, and TIMESTAMP.
  • No differences exist in performance or usage during a query between the partition key columns and other data columns.