This topic describes how to create a crawler to collect metadata from an E-MapReduce (EMR) data source. You can view the collected metadata on the Data Map page.
An EMR cluster is associated with your workspace as a compute engine instance. For more information, see Associate an EMR compute engine instance with a workspace.
- Only one crawler can be created for each cluster. You can select one or more databases from which metadata is to be collected for each crawler.
- Metadata can be collected by using an Alibaba Cloud account, as a RAM user to which the AliyunDataWorksFullAccess policy is attached, or as a RAM user that is assigned the metadata collection administrator role.
Create a crawler
- Log on to the DataWorks console and go to the DataMap page. For more information, see Go to the homepage.
- In the top navigation bar, click Data Discovery.
- Open the Create Crawler dialog box.
- In the left-side navigation pane, choose .
- On the E-MapReduceMetadata Crawler page, click Create Crawler.
- Configure the crawler.
- In the Create Crawler dialog box, select the cluster from which you want to collect metadata from the Select a cluster drop-down list. Note
- Optional: Select one or more databases from which you want to collect metadata from the Database drop-down list. If you do not select a database, the crawler automatically collects metadata from all the databases in the cluster.
- Click Authorize. On the Metadata tab of the page that appears, click Enable. Note
- By default, after an EMR cluster is associated with a workspace as a compute engine instance, the workspace is authorized to collect metadata from the EMR cluster.
- You must manually grant permissions for EMR clusters that are associated with DataWorks and from which metadata has not been collected.
- In the Confirm Operation message, click OK.
- Return to the Create Crawler dialog box on the Data Discovery page and click Refresh.
- After the value of the Authorization Status parameter changes to Authorized, click OK to create the crawler.
- In the Create Crawler dialog box, select the cluster from which you want to collect metadata from the Select a cluster drop-down list.
|1||In this section, you can enter the name of a crawler to search for the crawler.
Note The fuzzy match mode is supported. If you enter a keyword in the search box, crawlers whose names or data source names contain the keyword are displayed.
|2||In this section, you can view detailed information about a created crawler, such as
the status of the crawler, the databases from which the crawler collects metadata,
and the last time when the crawler was run.
You can also perform the following operations on the crawler:
|3||In this section, you can perform the following operations: