DataWorks API operations (2024-05-18) support queries of various metadata entities. This topic describes the concepts related to the metadata entities.
Metadata entity objects
Data Map collects and manages metadata entity objects of different types and levels (subtypes) by using metadata crawlers. For more information about the supported crawler types, see Supported crawler types.
Data Map supports the following metadata entity levels based on the metadata level structure:
Catalog
Database
Schema
Table
Column
Entity levels vary based on the crawler types.
Supported crawler types
Identifier | Display name | Supported metadata entity levels | Remarks | ||||
Catalog | DataBase | Schema | Table | Column | |||
| MaxCompute | ❌ | ❌ | ✅ | ✅ | ✅ |
|
| Data Lake Formation | ✅ | ✅ | ❌ | ✅ | ✅ | A default crawler is provided to identify all metadata entities within your Alibaba Cloud account. |
| HMS | ❌ | ✅ | ❌ | ✅ | ✅ |
|
| Hologres | ❌ | ✅ | ✅ | ✅ | ✅ | - |
| MySQL | ❌ | ✅ | ❌ | ✅ | ✅ | - |
| Oracle | ❌ | ✅ | ✅ | ✅ | ✅ | - |
| PostgreSQL | ❌ | ✅ | ✅ | ✅ | ✅ | - |
| SQL Server | ❌ | ✅ | ✅ | ✅ | ✅ | - |
| AnalyticDB MySQL | ❌ | ✅ | ❌ | ✅ | ✅ | This type of crawler can be used to collect metadata from analyticdb_for_mysql and analyticdb_for_spark data sources. |
| AnalytidDB MySQL 2.0 | ❌ | ✅ | ❌ | ✅ | ✅ | - |
| AnalyticDB PostgreSQL | ❌ | ✅ | ✅ | ✅ | ✅ | - |
| OTS | ❌ | ✅ | ❌ | ✅ | ✅ | - |
| ClickHouse | ❌ | ✅ | ❌ | ✅ | ✅ | - |
| StarRocks | ✅ | ✅ | ❌ | ✅ | ✅ | Catalogs are supported. This type of crawler can be used to query metadata entities only in internal catalogs. |
| Lindorm | ❌ | ✅ | ❌ | ✅ | ✅ | - |
Entity type (EntityType)
EntityType is the identifier of a metadata entity type. The value of EntityType is in the ${CrawlerType}-${SubType} format.
CrawlerTypeis the identifier of a crawler type. For example, the value of CrawlerType can bemysql,maxcompute,dlf, orholo.SubTypeis the identifier of a metadata entity subtype. For example, the value of SubType can becatalog,database,schema,table, orcolumn.
If a MaxCompute table is used, the value of EntityType is maxcompute-table.
Metadata entity ID (MetaEntityId)
MetaEntityId: indicates the identifier of a metadata entity object. The identifier has the characteristics of readability, uniqueness, and extensibility.
Crawler metadata instances and entity objects of catalogs, databases, schemas, tables, and columns are supported.
A metadata entity ID serves as the unique identifier of the entity. You can separate identifiers at each level with colons (:). Empty strings are used as placeholders for unsupported levels.
Crawler metadata instances
Crawler metadata entity ID: the unique identifier of the metadata entity ID.
For
MaxComputeandDLFcrawler types, a default crawler is provided for all metadata entities within the tenant or Alibaba Cloud account. The crawler metadata entity ID is in the${CrawlerType}format.For other types of crawlers that you must manually create, the crawler metadata entity ID is in the
${CrawlerType}:${MetaSourceId}format.CrawlerType: the identifier of a crawler type. For example, the value of CrawlerType can beholoormysql.MetaSourceId: the identifier of a metadata source.Instance mode: corresponds to an instance ID or a cluster ID.
URL mode: corresponds to the URL-encoded URL (Jdbc Url or Endpoint).
Examples:
For
MaxComputetype, the crawler metadata entity ID ismaxcompute.For
Hologrestype in instance mode, if the instance ID isi-z6j3kxxx7, the crawler metadata entity ID isholo:i-z6j3kxxx7.For
MySQLtype in URL mode, if the URL is jdbc:mysql://47.0.X.X:3306/test_db, the crawler metadata entity ID is mysql:jdbc%3Amysql%3A%2F%2F47.0.X.X%3A3306%2Ftest_db.
Data table related metadata entities
The metadata entity ID format is ${EntityType}:${MetaSourceId}:${Catalog}:${Database}:${Schema}:${Table}:${Column}. It includes the following elements:
Level | Property | Description |
- |
| The identifier of the entity type. |
- |
|
For |
Catalog |
| The catalog identifier. For StarRocks type, this is the catalog name. For DLF type, this is the catalog ID. For other types, an empty string is used as a placeholder. |
Database |
| The database name. |
Schema |
| The schema name. For types that do not support schema, an empty string is used as a placeholder. For |
Table |
| The data table name. |
Column |
| The field name. |
Metadata entity examples
The following are examples of metadata entity IDs at various levels including MaxCompute, DLF, HMS, Hologres, and MySQL.
In the following examples of IDs, you can separate identifiers at each level with colons (:). Empty strings are used as placeholders for unsupported levels.
MaxCompute
Only MaxCompute projects with the schema model enabled support the schema level, and require the schema name to be provided in the corresponding position in the data table and field IDs.
MaxCompute projects without the schema model enabled do not support the schema level, and an empty string is used as a placeholder in the corresponding position in the data table and field IDs.
For a project project_name (with the schema model enabled), schema schema_name, table table_name, and field column_name, the entity IDs at each level are as follows:
Level | ID |
Crawler metadata instance |
|
Project |
|
Schema |
|
Data table |
|
Column |
|
For a project project_name (without the schema model enabled), table table_name, and field column_name, the entity IDs at each level are as follows:
Level | ID |
Crawler metadata instance |
|
Project |
|
Data table |
|
Column |
|
DLF
For a catalog catalog_id, database database_name, table table_name, and field column_name, the entity IDs at each level are as follows:
Level | ID |
Crawler metadata instance |
|
Catalog |
|
Database |
|
Data table |
|
Column |
|
HMS
For an EMR cluster instance c-a1b2c3xxx, database test_db, table test_tbl, and field test_col, the entity IDs at each level are as follows:
Level | ID |
Crawler metadata instance |
|
Database |
|
Data table |
|
Column |
|
Hologres
In this example, the Hologres instance hgpostcn-cn-a1b2c3xxx, database test_db, schema test_schema, data table test_tbl, and column test_col are used. The following table describes the entity IDs at each level.
Level | ID |
Crawler metadata instance |
|
Project |
|
Schema |
|
Data table |
|
Column |
|
MySQL
For a MySQL data source connection string jdbc:mysql://47.0.X.X:3306/test_db, database test_db, table test_tbl, and field test_col, the entity IDs at each level are as follows (MetaSourceId is generated by URL-encoding the JDBC connection string):
Level | ID |
Crawler metadata instance |
|
Database |
|
Data table |
|
Column |
|