AnalyticDB for MySQL Data Lakehouse Edition (V3.0) allows you to create an Object Storage Service (OSS) metadata discovery task by using the creation wizard to discover OSS buckets and data objects in the same region as an Data Lakehouse Edition (V3.0) cluster. This topic describes how to use metadata discovery to import data from OSS to Data Lakehouse Edition (V3.0).

Background information

Data files that are generated from enterprise services, such as standard forms and log files, are uploaded to OSS on a regular basis. However, if you do not manage metadata in OSS objects, data is difficult to compute and analyze. You can create metadata discovery tasks to make the system automatically map OSS paths, group OSS objects, explore OSS object fields, and identify field changes. This way, data lake metadata is automatically created and updated.

Prerequisites

  • OSS is activated. An OSS bucket is created in the same region as an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Activate OSS and Create buckets.
  • Data files are uploaded to the OSS bucket. The following conditions are met for the format of the OSS path:
    • The OSS path is in the <Bucket_name>/directory/.../directory/table/object or <Bucket_name>/directory/.../directory/table/partition/.../partition/object format.
    • The bucket has at least one level of child directories.
    • The objects in the same table or partition of the OSS path are in the same format.
    • The objects in the same table or partition of the OSS path contain the same types of fields and the same quantity of fields.

Limits

You can create only a single metadata discovery task for each cluster in the same OSS path.

Procedure

  1. Log on to the AnalyticDB for MySQL console.
  2. In the upper-left corner of the page, select the region where the cluster resides.
  3. In the left-side navigation pane, click Clusters.
  4. On the Data Lakehouse Edition (V3.0) tab, find the cluster and click its Cluster ID.
  5. In the left-side navigation pane, choose Data Ingestion > Metadata Discovery.
  6. In the OSS Data Source section of the Metadata Discovery page, click Start Wizard.
  7. On the OSS Data Source tab, configure the parameters that are described in the following table.
    SectionParameterDescription
    Data Source SettingsData Warehouse ModeAutomatic metadata discovery is built for OSS-based standard data warehouses. It provides high identification precision.
    Note The free mode is not supported.
    OSS DirectoryThe OSS directory in which the objects are stored. The directory must end with a forward slash (/).

    The mapping results vary based on the value of the OSS Directory parameter. For more information, see the "Examples of mapping data from OSS paths to AnalyticDB for MySQL" section of this topic.

    Important When you specify the OSS Directory parameter, make sure that at least one level of child directories exists after the OSS directory.
    Path Filter RulesThe rules for filtering the objects to be mapped to AnalyticDB for MySQL. Specify one or more path prefixes for the objects.
    • Include: All objects that are stored in the specified storage path are mapped to AnalyticDB for MySQL.
      Important If multiple objects are stored in the specified storage path, the objects must be in the same format and contain the same types of fields and the same quantity of fields.
    • Exclude: None of the objects that are stored in the specified storage path are mapped to AnalyticDB for MySQL.
    Format ParserThe format parser that reads data from objects to determine the object format. By default, Automatic is selected. This value specifies that all built-in format parsers are called in sequence.
    You can select a format parser that can parse the format of the objects that are stored in the OSS path.
    Important If you select a format parser that can parse a format that is different from the object format, the mapping fails.
    Setting Options (Optional)The advanced custom settings. If you set the Format Parser parameter to csv, you can configure the following parameters:
    • Field Delimiter: the column delimiter of CSV objects.
    • Reference Identifier: the reference identifier of parsed AnalyticDB for MySQL tables.
    • Table Header Mode: the method that is used to identify table headers of CSV tables. The table headers are used as column names.
    • Allow Single Field: specifies whether to allow only a single column to exist for a row of data in a CSV object.
      • false: does not allow only a single column.
      • true: allows only a single column.
    • Other Settings: You can configure parameters to specify whether to use the first row of a CSV object as column names. If you have this requirement, contact technical support.
    Scheduling SettingsScheduling FrequencyThe scheduling frequency for metadata discovery tasks to obtain the changes to object columns or data on a regular basis.

    If the OSS object data changes, the table data that is re-mapped to AnalyticDB for MySQL is updated based on the rules that are specified in Destination Metadata Settings.

    Destination Metadata SettingsSchema NameThe name of the schema, which is the name of the mapped AnalyticDB for MySQL database. By default, a schema is created for each metadata discovery task.
    When you specify a name for the database, take note of the following items:
    • The database name cannot be the same as the name of an existing database in the AnalyticDB for MySQL cluster.
    • The database name cannot be the same as the name of an existing database in other metadata discovery tasks.
    Setting Options (Optional)The rules for updating the table data that is re-mapped to AnalyticDB for MySQL if an OSS object field changes or an OSS object is deleted.
    • Object Field Change Rules: Only the Add Only Columns option is supported. If you add a column to an OSS object, a column is also added to the mapped AnalyticDB for MySQL table in the next metadata discovery task.
    • Object Delete Change Rules: Only the Ignore Delete Updates option is supported. If you delete an OSS object, the mapped AnalyticDB for MySQL table still exists in the next metadata discovery task.
    Important You can perform DDL and DML operations on the mapped AnalyticDB for MySQL tables.
    • If you add a column to an AnalyticDB for MySQL table, the column still exists for the next metadata discovery task.
    • If you remove a column from an AnalyticDB for MySQL table, the column is still mapped to the AnalyticDB for MySQL table in the next metadata discovery task.
  8. After you configure the preceding parameters, click Create to create a metadata discovery task.
    Note After the metadata discovery task is created, the system automatically executes the task based on the specified scheduling frequency. If you want to immediately synchronize data, find the task on the Tasks tab and click Execute in the Actions column.
  9. After the task starts, manage the task on the Tasks tab. You can view the status of the task and modify the configurations of the task.
    Note After the task is completed, you can choose Job Editor > SQL Development to view the databases, tables, and partitions that are mapped to AnalyticDB for MySQL.

Examples of mapping data from OSS paths to AnalyticDB for MySQL

The results of mapping data from OSS paths to AnalyticDB for MySQL vary based on the following elements:
  • The OSS object and its path.
  • The OSS Directory parameter that is specified for the metadata discovery task.
The system maps the name of the first directory level after the OSS directory to the table name and the names of the second to last directory levels after the OSS directory to partition names. The following table describes the mapping examples.
OSS pathOSS directoryMapped AnalyticDB for MySQL table
oss://adb/Table1/file1.csv

oss://adb/Table1/file2.json

oss://adb/No data mapping is performed.
Causes: The objects that are stored in the Table1 directory are in different formats: CSV and JSON.
Important If the objects are in the same format but the fields use different data types, no data mapping is performed.
oss://adb/Table2/year/month/day/file3.json

oss://adb/Table2/year/month/day/file4.json

oss://adb/The mapped AnalyticDB for MySQL table is a partitioned table named Table2. The partition fields are partition_0, partition_1, and partition_2.
Note No partition keys exist in these partitions. In this case, partition_num is used.
oss://adb/Table2/The mapped AnalyticDB for MySQL table is a partitioned table named year. The partition fields are partition_0 and partition_1.
oss://adb/Table2/year/month/The mapped AnalyticDB for MySQL table is a non-partitioned table named day.
oss://adb/Table2/year/month/day/No data mapping is performed.

Causes: No directory levels exist after the specified OSS directory.

oss://adb/Table3/year=2022/month=03/day=01/file5.csv

oss://adb/Table3/year=2022/month=03/day=01/file6.csv

oss://adb/The mapped AnalyticDB for MySQL table is a partitioned table named Table3. The partition fields are year, month, and day.
  • oss://adb/Table3/
  • oss://adb/Table3/year=2022/
No data mapping is performed.

Causes: The year=2022 or month=03 directory does not conform to the naming conventions for AnalyticDB for MySQL tables. For more information, see Limits.

oss://adb/Table4/2020/03/30/file7.csv

oss://adb/Table3/2020/03/30/file8.csv

oss://adb/The mapped AnalyticDB for MySQL table is a partitioned table named Table4. The partition fields are partition_0, partition_1, and partition_2.
Note No partition keys exist in these partitions. In this case, partition_num is used.
  • oss://adb/Table3/
  • oss://adb/Table3/2020/
No data mapping is performed.

Causes: The 2020 or 03 directory does not conform to the naming conventions for AnalyticDB for MySQL tables. For more information, see Limits.