DataWorks DataMap provides metadata crawlers that you can use to collect metadata of all or specific E-MapReduce (EMR) databases. DataMap also allows you to use the manual table synchronization feature to collect metadata of a single table. This improves the efficiency of collecting metadata of a single table. After you collect the metadata, you can view the related data in DataMap. This topic describes how to collect metadata of EMR tables to DataWorks.
Prerequisites
An EMR cluster is associated with your workspace as a compute engine instance. For information about how to associate an EMR cluster with a DataWorks workspace as an EMR compute engine instance, see Create and manage workspaces.Background information
After you create a metadata crawler to collect full metadata of EMR tables, the system enables automated incremental metadata collection. This way, the metadata crawler can automatically synchronize incremental metadata of the EMR tables to DataWorks.Limits
- Only one metadata crawler can be created for each EMR cluster. You can select one or more databases of which you want to collect metadata for each crawler.
- The metadata collection capability varies based on the type of the EMR cluster and the metadata storage type. The following table provides the details.
EMR cluster type Metadata storage type Collect metadata of a single table (Use the manual table synchronization feature on the All Data page)
Collect metadata of a database (Create a metadata crawler on the Data Discovery page)
DataLake cluster in the new data lake scenario DLF Unified Metadata Supported No configuration is required. The system automatically updates metadata. Self-managed RDS or Built-in MySQL Supported The related configurations are required. You must manually update metadata based on your business requirements. Hadoop cluster in the old data lake scenario DLF Unified Metadata Supported No configuration is required. The system automatically updates metadata. Self-managed RDS or Built-in MySQL Not supported The related configurations are required. You must manually update metadata based on your business requirements. Note- For information about the metadata storage types for EMR clusters, see Manage metadata.
- If you want to collect metadata only of a single table to DataWorks, you can use the manual table synchronization feature on the All Data page to manually synchronize the metadata of a single table. For more information, see the Use the manual table synchronization feature to collect metadata of a single table section in this topic.
- If you want to collect metadata of all or specific databases, you can create and use a metadata crawler to collect metadata of the databases. For more information, see the Create a metadata crawler to collect metadata of a database section in this topic.
- Only an Alibaba Cloud account, a RAM user to which the AliyunDataWorksFullAccess policy is attached, or a RAM user to which the metadata collection administrator role is assigned can collect metadata.
Use the manual table synchronization feature to collect metadata of a single table
- Log on to the DataWorks console and go to the DataMap page. For more information, see Go to the homepage of DataMap.
- In the top navigation bar of the DataMap page, click All Data.
- In the upper-right corner of the page that appears, click Manually Synchronize Table. In the Manually Synchronize Table dialog box, select E-MapReduce for Data Source Type and configure the following parameters for the desired EMR table: Cluster ID, Database, and Table Name.
- After the configuration is complete, click Start Synchronize to synchronize metadata of the desired EMR table.
Create a metadata crawler to collect metadata of a database
After you create a metadata crawler to collect full metadata of EMR tables, the system enables automated incremental metadata collection. This way, the metadata crawler can automatically synchronize incremental metadata from the EMR tables to DataWorks.
- Log on to the DataWorks console and go to the DataMap page. For more information, see Go to the homepage of DataMap.
- In the top navigation bar, click Data Discovery.
- Open the Create Crawler dialog box.
- Configure the metadata crawler.
Manage crawlers

Area | Description |
---|---|
1 | In this area, you can enter the name of a crawler to search for the crawler. Note Fuzzy match is supported. If you enter a keyword in the search box, crawlers whose names contain the keyword are displayed. |
2 | In this area, you can view information about a created crawler, such as the status of the crawler, the databases from which the crawler collects metadata, and the time when the crawler was last run.
You can also perform the following operations on the crawler:
|
3 | In this area, you can perform the following operations:
|