This topic describes how to create a crawler to collect metadata from an E-MapReduce (EMR) data source to DataWorks. You can view the collected metadata on the Data Map page.

Prerequisites

An EMR cluster is associated with your workspace as a compute engine instance. For more information, see Associate an EMR compute engine instance with a workspace.

Limits

  • You cannot collect metadata across regions. You must create a crawler in the region where the source metadata resides to collect the metadata.
  • You must collect metadata over the Internet.

Procedure

  1. Go to the Data Discovery page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find the workspace in the list and click Data Analytics in the Actions column.
    4. On the DataStudio page, click the More icon icon in the upper-left corner and choose All Products > Data governance > DataMap.
    5. In the top navigation bar, click Data Discovery.
  2. On the E-MapReduce Metadata Crawler page, click Create Crawler.
  3. In the Create Crawler dialog box, select the associated EMR cluster from the Select a cluster drop-down list and click Authorize.
    Create Crawler
  4. On the page that appears, click the Metadata tab and click Enable.
    Enable
  5. In the Confirm Operation message, click OK.
  6. Return to the Create Crawler dialog box on the E-MapReduce Metadata Crawler page and click Refresh.
  7. After the authorization status changes to Authorized, click Commit.
  8. On the E-MapReduce Metadata Crawler page, find the created crawler and click Obtain All in the Actions column.
    Click Refresh in the upper-right corner of the page and verify that the value in the Running Status column of the crawler changes to Collected.
    Note After full metadata from the EMR data source is collected, the system automatically synchronizes new metadata from the data source.

    If you want to delete the created crawler, click Delete in the Actions column. In the Delete Instance message, click OK.

  9. View the metadata collected from the EMR data source.
    1. In the top navigation bar, click All Data.
    2. Select E-MapReduce from the drop-down list in the upper part of the page.
    3. Click the name of a table that stores the collected metadata and view the table details.