AnalyticDB for MySQL Data Lakehouse Edition (V3.0) allows you to create an Object Storage Service (OSS) metadata discovery task by using the creation wizard to discover OSS buckets and data objects in the same region as an Data Lakehouse Edition (V3.0) cluster. This topic describes how to use metadata discovery to import data from OSS to Data Lakehouse Edition (V3.0).
Background information
Data files that are generated from enterprise services, such as standard forms and log files, are uploaded to OSS on a regular basis. However, if you do not manage metadata in OSS objects, data is difficult to compute and analyze. You can create metadata discovery tasks to make the system automatically map OSS paths, group OSS objects, explore OSS object fields, and identify field changes. This way, data lake metadata is automatically created and updated.
Prerequisites
- OSS is activated. An OSS bucket is created in the same region as an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Activate OSS and Create buckets.
- Data files are uploaded to the OSS bucket. The following conditions are met for the format of the OSS path:
- The OSS path is in the <Bucket_name>/directory/.../directory/table/object or <Bucket_name>/directory/.../directory/table/partition/.../partition/object format.
- The bucket has at least one level of child directories.
- The objects in the same table or partition of the OSS path are in the same format.
- The objects in the same table or partition of the OSS path contain the same types of fields and the same quantity of fields.
Limits
You can create only a single metadata discovery task for each cluster in the same OSS path.
Procedure
- Log on to the AnalyticDB for MySQL console.
- In the upper-left corner of the page, select the region where the cluster resides.
- In the left-side navigation pane, click Clusters.
- On the Data Lakehouse Edition (V3.0) tab, find the cluster and click its Cluster ID.
- In the left-side navigation pane, choose .
- In the OSS Data Source section of the Metadata Discovery page, click Start Wizard.
- On the OSS Data Source tab, configure the parameters that are described in the following table.
Section Parameter Description Data Source Settings Data Warehouse Mode Automatic metadata discovery is built for OSS-based standard data warehouses. It provides high identification precision. Note The free mode is not supported.OSS Directory The OSS directory in which the objects are stored. The directory must end with a forward slash (/). The mapping results vary based on the value of the OSS Directory parameter. For more information, see the "Examples of mapping data from OSS paths to AnalyticDB for MySQL" section of this topic.
Important When you specify the OSS Directory parameter, make sure that at least one level of child directories exists after the OSS directory.Path Filter Rules The rules for filtering the objects to be mapped to AnalyticDB for MySQL. Specify one or more path prefixes for the objects. - Include: All objects that are stored in the specified storage path are mapped to AnalyticDB for MySQL. Important If multiple objects are stored in the specified storage path, the objects must be in the same format and contain the same types of fields and the same quantity of fields.
- Exclude: None of the objects that are stored in the specified storage path are mapped to AnalyticDB for MySQL.
Format Parser The format parser that reads data from objects to determine the object format. By default, Automatic is selected. This value specifies that all built-in format parsers are called in sequence. You can select a format parser that can parse the format of the objects that are stored in the OSS path.Important If you select a format parser that can parse a format that is different from the object format, the mapping fails.Setting Options (Optional) The advanced custom settings. If you set the Format Parser parameter to csv, you can configure the following parameters: - Field Delimiter: the column delimiter of CSV objects.
- Reference Identifier: the reference identifier of parsed AnalyticDB for MySQL tables.
- Table Header Mode: the method that is used to identify table headers of CSV tables. The table headers are used as column names.
- Allow Single Field: specifies whether to allow only a single column to exist for a row of data in a CSV object.
- false: does not allow only a single column.
- true: allows only a single column.
- Other Settings: You can configure parameters to specify whether to use the first row of a CSV object as column names. If you have this requirement, contact technical support.
Scheduling Settings Scheduling Frequency The scheduling frequency for metadata discovery tasks to obtain the changes to object columns or data on a regular basis. If the OSS object data changes, the table data that is re-mapped to AnalyticDB for MySQL is updated based on the rules that are specified in Destination Metadata Settings.
Destination Metadata Settings Schema Name The name of the schema, which is the name of the mapped AnalyticDB for MySQL database. By default, a schema is created for each metadata discovery task. When you specify a name for the database, take note of the following items:- The database name cannot be the same as the name of an existing database in the AnalyticDB for MySQL cluster.
- The database name cannot be the same as the name of an existing database in other metadata discovery tasks.
Setting Options (Optional) The rules for updating the table data that is re-mapped to AnalyticDB for MySQL if an OSS object field changes or an OSS object is deleted. - Object Field Change Rules: Only the Add Only Columns option is supported. If you add a column to an OSS object, a column is also added to the mapped AnalyticDB for MySQL table in the next metadata discovery task.
- Object Delete Change Rules: Only the Ignore Delete Updates option is supported. If you delete an OSS object, the mapped AnalyticDB for MySQL table still exists in the next metadata discovery task.
Important You can perform DDL and DML operations on the mapped AnalyticDB for MySQL tables.- If you add a column to an AnalyticDB for MySQL table, the column still exists for the next metadata discovery task.
- If you remove a column from an AnalyticDB for MySQL table, the column is still mapped to the AnalyticDB for MySQL table in the next metadata discovery task.
- Include: All objects that are stored in the specified storage path are mapped to AnalyticDB for MySQL.
- After you configure the preceding parameters, click Create to create a metadata discovery task. Note After the metadata discovery task is created, the system automatically executes the task based on the specified scheduling frequency. If you want to immediately synchronize data, find the task on the Tasks tab and click Execute in the Actions column.
- After the task starts, manage the task on the Tasks tab. You can view the status of the task and modify the configurations of the task. Note After the task is completed, you can choose AnalyticDB for MySQL.to view the databases, tables, and partitions that are mapped to
Examples of mapping data from OSS paths to AnalyticDB for MySQL
- The OSS object and its path.
- The OSS Directory parameter that is specified for the metadata discovery task.
OSS path | OSS directory | Mapped AnalyticDB for MySQL table |
---|---|---|
oss://adb/Table1/file1.csv oss://adb/Table1/file2.json | oss://adb/ | No data mapping is performed. Causes: The objects that are stored in the Table1 directory are in different formats: CSV and JSON. Important If the objects are in the same format but the fields use different data types, no data mapping is performed. |
oss://adb/Table2/year/month/day/file3.json oss://adb/Table2/year/month/day/file4.json | oss://adb/ | The mapped AnalyticDB for MySQL table is a partitioned table named Table2 . The partition fields are partition_0 , partition_1 , and partition_2 . Note No partition keys exist in these partitions. In this case, partition_num is used. |
oss://adb/Table2/ | The mapped AnalyticDB for MySQL table is a partitioned table named year . The partition fields are partition_0 and partition_1 . | |
oss://adb/Table2/year/month/ | The mapped AnalyticDB for MySQL table is a non-partitioned table named day . | |
oss://adb/Table2/year/month/day/ | No data mapping is performed. Causes: No directory levels exist after the specified OSS directory. | |
oss://adb/Table3/year=2022/month=03/day=01/file5.csv oss://adb/Table3/year=2022/month=03/day=01/file6.csv | oss://adb/ | The mapped AnalyticDB for MySQL table is a partitioned table named Table3 . The partition fields are year , month , and day . |
| No data mapping is performed. Causes: The year=2022 or month=03 directory does not conform to the naming conventions for AnalyticDB for MySQL tables. For more information, see Limits. | |
oss://adb/Table4/2020/03/30/file7.csv oss://adb/Table3/2020/03/30/file8.csv | oss://adb/ | The mapped AnalyticDB for MySQL table is a partitioned table named Table4 . The partition fields are partition_0 , partition_1 , and partition_2 . Note No partition keys exist in these partitions. In this case, partition_num is used. |
| No data mapping is performed. Causes: The 2020 or 03 directory does not conform to the naming conventions for AnalyticDB for MySQL tables. For more information, see Limits. |