DataWorks API (2024-05-18) uses a structured identifier system for metadata entities across data sources. Each entity -- a database, table, or column -- has a unique ID based on its position in the hierarchy.
Entity hierarchy
Data Map discovers and manages metadata through crawlers. Each crawler type organizes metadata into a hierarchy of up to five levels:
Catalog > Database > Schema > Table > Column
Not all crawler types support every level. For example, MaxCompute supports Schema, Table, and Column but not Catalog or Database. MySQL supports Database, Table, and Column but not Catalog or Schema. The supported levels determine how you construct entity IDs.
Supported crawler types
Identifier | Display name | Catalog | Database | Schema | Table | Column | Remarks |
| MaxCompute | -- | -- | Yes | Yes | Yes | A default crawler identifies all metadata entities in your Alibaba Cloud account. In MaxCompute, projects correspond to the database level, but you cannot query projects through API operations. Schema support depends on whether the three-layer model is enabled for your MaxCompute project. |
| Data Lake Formation | Yes | Yes | -- | Yes | Yes | A default crawler identifies all metadata entities in your Alibaba Cloud account. |
| HMS | -- | Yes | -- | Yes | Yes | Uses Hive Metastore Service (HMS) to manage metadata. Collects metadata from E-MapReduce (EMR) and CDH_HIVE clusters. |
| Hologres | -- | Yes | Yes | Yes | Yes | - |
| MySQL | -- | Yes | -- | Yes | Yes | - |
| Oracle | -- | Yes | Yes | Yes | Yes | - |
| PostgreSQL | -- | Yes | Yes | Yes | Yes | - |
| SQL Server | -- | Yes | Yes | Yes | Yes | - |
| AnalyticDB MySQL | -- | Yes | -- | Yes | Yes | Also collects metadata from |
| AnalyticDB MySQL 2.0 | -- | Yes | -- | Yes | Yes | - |
| AnalyticDB PostgreSQL | -- | Yes | Yes | Yes | Yes | - |
| OTS | -- | Yes | -- | Yes | Yes | - |
| ClickHouse | -- | Yes | -- | Yes | Yes | - |
| StarRocks | Yes | Yes | -- | Yes | Yes | Supports catalogs. Queries metadata entities in internal catalogs only. |
| Lindorm | -- | Yes | -- | Yes | Yes | - |
Entity type
The EntityType parameter identifies the type and level of a metadata entity. Format:
${CrawlerType}-${SubType}CrawlerType: The crawler identifier, such asmysql,maxcompute,dlf, orholo.SubType: The entity level, such ascatalog,database,schema,table, orcolumn.
For example, a MaxCompute table has the entity type maxcompute-table.
Metadata entity ID
The MetaEntityId parameter uniquely identifies a metadata entity. It is designed for readability, uniqueness, and extensibility.
Separate each level with a colon (:). Use an empty string as a placeholder for unsupported levels. This keeps the number of colon-separated segments consistent across all entity types, making IDs predictable to parse and construct.
Crawler metadata instance ID
The crawler metadata instance is the root of the entity hierarchy. Its ID format depends on whether the crawler is default or manually created.
Default crawlers (MaxCompute and DLF): The ID is the crawler type alone.
${CrawlerType}Manually created crawlers: The ID includes the metadata source identifier.
${CrawlerType}:${MetaSourceId}The MetaSourceId value depends on the connection mode:
Mode | MetaSourceId value | Example |
Instance mode | An instance ID or cluster ID |
|
URL mode | The URL-encoded JDBC URL or endpoint |
|
Examples:
Crawler type | Scenario | Crawler metadata instance ID |
MaxCompute | Default crawler |
|
Hologres | Instance mode, instance ID |
|
MySQL | URL mode, JDBC URL |
|
Data table entity ID format
For catalog, database, schema, table, and column entities, the ID follows this format:
${EntityType}:${MetaSourceId}:${Catalog}:${Database}:${Schema}:${Table}:${Column}Each segment maps to a specific level:
Level | Segment | Description |
- |
| The entity type identifier (for example, |
- |
| The instance ID, cluster ID, or URL-encoded JDBC URL. For MaxCompute and DLF, use an empty string. |
Catalog |
| The catalog identifier. For StarRocks, this is the catalog name. For DLF, this is the catalog ID. For other types, use an empty string. |
Database |
| The database name. |
Schema |
| The schema name. For types that do not support schema, use an empty string. For MaxCompute, provide the schema name when the schema model is enabled; otherwise, use an empty string. |
Table |
| The table name. |
Column |
| The column name. |
Omit trailing segments for higher-level entities. For example, a table-level ID does not include the Column segment.
Entity ID examples
The following examples show how to construct entity IDs for different crawler types and levels.
Colons (:) separate each level. Empty strings between consecutive colons (for example, ::) represent unsupported levels.
MaxCompute
MaxCompute supports two modes depending on whether the three-layer model (schema model) is enabled.
Schema model enabled
For a project project_name, schema schema_name, table table_name, and column column_name:
Level | Entity ID |
Crawler metadata instance |
|
Project |
|
Schema |
|
Table |
|
Column |
|
The three leading colons (:::) before project_name represent empty placeholders for MetaSourceId (MaxCompute uses a default crawler) and Catalog (not supported).
Schema model disabled
For a project project_name, table table_name, and column column_name:
Level | Entity ID |
Crawler metadata instance |
|
Project |
|
Table |
|
Column |
|
When the schema model is disabled, the schema position is an empty placeholder (:: between project_name and table_name).
DLF
For a catalog catalog_id, database database_name, table table_name, and column column_name:
Level | Entity ID |
Crawler metadata instance |
|
Catalog |
|
Database |
|
Table |
|
Column |
|
DLF uses a default crawler, so MetaSourceId is an empty string (the :: before catalog_id). DLF does not support the schema level, so an empty placeholder appears between database_name and table_name.
HMS
For an EMR cluster c-a1b2c3xxx, database test_db, table test_tbl, and column test_col:
Level | Entity ID |
Crawler metadata instance |
|
Database |
|
Table |
|
Column |
|
HMS does not support catalog or schema, so empty placeholders appear in those positions.
Hologres
For a Hologres instance hgpostcn-cn-a1b2c3xxx, database test_db, schema test_schema, table test_tbl, and column test_col:
Level | Entity ID |
Crawler metadata instance |
|
Database |
|
Schema |
|
Table |
|
Column |
|
Hologres uses instance mode, so MetaSourceId is the instance ID. Hologres does not support catalogs, so an empty placeholder appears between MetaSourceId and Database.
MySQL
For a MySQL data source with the JDBC URL jdbc:mysql://47.0.X.X:3306/test_db, database test_db, table test_tbl, and column test_col:
Level | Entity ID |
Crawler metadata instance |
|
Database |
|
Table |
|
Column |
|
The MetaSourceId is the URL-encoded JDBC URL. MySQL does not support catalog or schema, so empty placeholders appear in those positions.
Quick reference
The following table summarizes the entity ID pattern for each crawler type. Empty segments are shown as (empty). The MetaSourceId format depends on how the data source is registered (instance mode or URL mode). The table shows the most common pattern for each type.
Crawler type | MetaSourceId | Catalog | Schema | Example table ID |
| (empty) | (empty) | schema_name or (empty) |
|
| (empty) | catalog_id | (empty) |
|
| cluster_id | (empty) | (empty) |
|
| instance_id | (empty) | schema_name |
|
| URL-encoded JDBC URL | (empty) | (empty) |
|
| instance_id or URL | (empty) | schema_name |
|
| instance_id or URL | (empty) | schema_name |
|
| instance_id or URL | (empty) | schema_name |
|
| instance_id | (empty) | (empty) |
|
| instance_id | (empty) | (empty) |
|
| instance_id | (empty) | schema_name |
|
| instance_id | (empty) | (empty) |
|
| instance_id or URL | (empty) | (empty) |
|
| instance_id | catalog_name | (empty) |
|
| instance_id | (empty) | (empty) |
|