All Products
Search
Document Center

AnalyticDB:Import data using metadata discovery

Last Updated:Aug 06, 2025

Enterprise services regularly upload data files, such as standard forms and log files, to OSS. However, this data often lacks metadata management, which makes it difficult to analyze and compute. The metadata discovery feature of AnalyticDB for MySQL supports automatic mapping of OSS paths, intelligent grouping of files, and dynamic detection of field changes to automatically create and update data lake metadata. You can create an OSS metadata discovery job to automatically discover OSS buckets and data files in the same region and import data from OSS to AnalyticDB for MySQL.

Prerequisites

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.

  • A database account is created for the AnalyticDB for MySQL cluster.

  • An Object Storage Service (OSS) bucket is created in the same region as the AnalyticDB for MySQL cluster.

  • Data files are uploaded to the OSS bucket. The OSS path must meet the following conditions:

    • The OSS path is in the <BucketName>/directory/.../directory/table/file or <BucketName>/directory/.../directory/table/partition/.../partition/file format.

    • The OSS bucket has at least one level of child directories.

    • The files in the same table or partition in the OSS path are in the same format.

    • The files in the same table or partition in the OSS path contain the same types and number of fields.

Limits

You can create only one metadata discovery job for each cluster in the same OSS path.

Procedure

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  2. In the navigation pane on the left, choose Data Ingestion > Metadata Discovery.

  3. On the Metadata Discovery page, in the OSS Data Source area, click Start Wizard.

    Note

    If the Start Wizard button is dimmed, you must create a privileged account first.

    1. On the OSS Data Source tab, configure the parameters as described in the following table.

      Configuration item

      Parameter

      Description

      Datasource Config

      Data Warehouse Mode

      Automatic metadata discovery is built for OSS-based standard data warehouses and provides high identification precision.

      Note

      The free mode is not supported.

      OSS Directory Location

      • Common files: the OSS storage path of the files. The path must end with a forward slash (/).

        The mapping results vary based on the value of the OSS Directory Location parameter. For more information, see mapping results.

        Important

        When you configure the OSS Directory Location parameter, make sure that at least one level of child directories exists after the OSS Directory Location. You cannot select the parent directory of the files.

      • Data lake tables: the upper-level directory of the OSS storage path of data lake tables. The directory must end with a forward slash (/). For example, if the directory of Iceberg tables is oss://adb/testdb/iceberg_table/, you must set the OSS directory location to oss://adb/testdb/.

      Path Filter Rule (optional)

      Specify the storage path of the files that you want to map to AnalyticDB for MySQL.

      • Include: All files that are stored in the specified storage path are mapped to AnalyticDB for MySQL.

        Important

        If multiple files are stored in the selected storage path, the files must be of the same type and contain the same types and number of columns to be mapped.

      • Exclude: The files that are stored in the specified storage path are not mapped to AnalyticDB for MySQL.

      Format Resolver

      Select a format resolver that matches the file format. For common file formats, the format resolver also provides an automatic parsing feature, which calls the resolvers for common file formats in sequence to parse files.

      Important

      If the selected resolver type is inconsistent with the file format, the mapping fails.

      Supported resolver types:

      • Common Files: csv, json, parquet, avro, orc, and automatic.

      • Data Lake Tables: iceberg.

        Note

        The data lake table feature is in public preview. To use this feature, Submit a ticket to contact technical support.

      Configuration Item (optional)

      Advanced custom settings. When you select CSV for the format resolver, you can set the following parameters:

      • Column delimiter: specifies the column delimiter of CSV data files.

      • Reference identifier: specifies the reference identifier of the AnalyticDB for MySQL table after parsing.

      • Table header mode: identifies the table header of the CSV file to determine the column names of the table.

      • Allow single column field: specifies whether to allow a row of records in a CSV data file to have only one column.

        • false: No.

        • true: Yes.

      • Other configurations: You can configure parameters to specify whether to use the first row of a CSV file as column names. If you have this requirement, contact technical support.

      Scheduling Configuration

      Scheduling Frequency

      If you want to periodically obtain changes to columns or data in data files, you can set a scheduling frequency to periodically execute metadata discovery jobs.

      If the data of OSS files changes, the data of the table that is re-mapped to AnalyticDB for MySQL is updated based on the Destination Metadata Configuration rules.

      Destination Metadata Configuration

      Schema Name

      Set the schema name, which is mapped to the database name in AnalyticDB for MySQL. By default, a new schema is created for each discovery job.

      When you specify a database name, take note of the following items:

      • The database name cannot be the same as the name of an existing database in AnalyticDB for MySQL.

      • The database name cannot be the same as the database name of another metadata discovery job.

      Configuration Item (optional)

      Specify how to update the table that is mapped to AnalyticDB for MySQL when an OSS file field changes or an OSS file is deleted.

      • File Field Change Rule:

        • Add Only Columns: When a column is added to an OSS file, the column is also added to the table that is mapped to AnalyticDB for MySQL after the metadata discovery job is executed again.

        • Ignore Table Updates: When a column or partition of an OSS file is changed, only the partition is synchronized to the table that is mapped to AnalyticDB for MySQL after the metadata discovery job is executed again. Columns are not added or deleted.

      • Object Deletion Change Rule: Only Ignore Deletion Updates is supported. This means that if an OSS file is deleted, the table that is mapped to AnalyticDB for MySQL still exists after the metadata discovery job is executed again.

      Important

      You can perform DML operations on the table that is mapped to AnalyticDB for MySQL.

      • If you add a column to an AnalyticDB for MySQL table, the column is not overwritten and still exists after the metadata discovery job is executed again.

      • If you delete a column from the table that is mapped to AnalyticDB for MySQL, the deleted column is mapped to AnalyticDB for MySQL again after the metadata discovery job is executed again.

  4. After you configure the parameters, click Create.

    Note

    After the metadata discovery job is created, it automatically runs at the specified interval. If you want to run the job immediately, find the job in the job list and click Execute in the Actions column.

  5. After the job starts, you can manage it on the Job List page. You can view the running status of the job and modify its configuration.

    Note

    After the job runs successfully, you can go to the Job Development > SQL Development page to view the databases, tables, and partitions that are mapped to AnalyticDB for MySQL.

Examples of mapping OSS paths to AnalyticDB for MySQL

The results of mapping OSS paths to AnalyticDB for MySQL vary based on the following two factors.

  • The OSS file and its path.

  • The OSS Directory Location selected in the metadata discovery job.

Based on the OSS Directory Location that you select, the system automatically maps the first-level directory under the OSS Directory Location to a table name, and all subsequent directories under the OSS Directory Location to partitions. The following is an example.

OSS path

OSS directory location

Table mapped to AnalyticDB for MySQL

oss://adb/Table1/file1.csv

oss://adb/Table1/file2.json

oss://adb/

A table cannot be mapped.

Reason: The files in the Table1 directory are in the CSV and JSON formats. The file formats are inconsistent. Therefore, a table cannot be mapped.

Important

If the files are in the same format but the data types of the fields in the files are inconsistent, a table cannot be mapped either.

oss://adb/Table2/year/month/day/file3.json

oss://adb/Table2/year/month/day/file4.json

oss://adb/

The table that is mapped to AnalyticDB for MySQL is a partitioned table named Table2. The partition fields are partition_0, partition_1, and partition_2.

Note

Because no partition keys exist, partition_num is used.

oss://adb/Table2/

The table that is mapped to AnalyticDB for MySQL is a partitioned table named year. The partition fields are partition_0 and partition_1.

oss://adb/Table2/year/month/

The table that is mapped to AnalyticDB for MySQL is a non-partitioned table named day.

oss://adb/Table2/year/month/day/

A table cannot be mapped.

Reason: No other directory levels exist after the selected OSS Directory Location.

oss://adb/Table3/year=2022/month=03/day=01/file5.csv

oss://adb/Table3/year=2022/month=03/day=01/file6.csv

oss://adb/

The table that is mapped to AnalyticDB for MySQL is a partitioned table named Table3. The partition fields are year, month, and day.

  • oss://adb/Table3/

  • oss://adb/Table3/year=2022/

A table cannot be mapped.

Reason: year=2022 or month=03 does not conform to the naming conventions for tables in AnalyticDB for MySQL.

oss://adb/Table4/2020/03/30/file7.csv

oss://adb/Table3/2020/03/30/file8.csv

oss://adb/

The table that is mapped to AnalyticDB for MySQL is a partitioned table named Table4. The partition fields are partition_0, partition_1, and partition_2.

Note

Because no partition keys exist, partition_num is used.

  • oss://adb/Table3/

  • oss://adb/Table3/2020/

A table cannot be mapped.

Reason: 2020 or 03 does not conform to the naming conventions for tables in AnalyticDB for MySQL.