This topic describes how to create a crawler to collect metadata from an Object Storage Service (OSS) data store to DataWorks. You can view the collected metadata on the Data Map page.

Background information

DataWorks allows you to collect metadata from OSS data stores only in the China (Shanghai) region. This feature is in invitational preview.

Procedure

  1. Go to the Data Discovery page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces. The Workspaces page appears.
    3. Find the target workspace and click Data Analytics in the Actions column.
    4. On the DataStudio page, click Icon in the upper-left corner and choose All Products > DataMap. The Data Map page appears.
    5. Click Data Discovery in the top navigation bar.
  2. In the left-side navigation pane, click OSS.
  3. On the OSSMetadata Crawler page, click Create Crawler.
  4. In the Create Crawler dialog box, perform the following steps:
    1. In the Basic Information step, set the basic parameters.
      Create Crawler
      Parameter Description
      Crawler Name Required. The name of the crawler. You must specify a unique name.
      Crawler Description The description of the crawler.
      Connection Type The type of the data store from which metadata will be collected. The default value is OSS and cannot be changed.
    2. Click Next.
    3. In the Select Collection Object step, set the parameters of the object from which metadata will be collected.
      Select Collection Object
      Parameter Description
      Workspace The workspace of the OSS data store from which metadata will be collected.
      Connection The connection to the OSS data store from which metadata will be collected. If the required connection does not exist, go to the Data Source page in Workspace Management and create the connection. For more information, see Configure an OSS connection.
      Object Path The path of the OSS object from which metadata will be collected.
      Path Traversal Specifies whether to traverse sub-paths in the specified path.
      Prefix The prefix of the names of tables that the crawler automatically generates. By default, a generated table is named after the corresponding OSS object.
    4. Click Next.
    5. In the Configure Execution Plan step, set the scheduling parameters.
      Configure Execution Plan
      Parameter Description
      Execution Plan The execution plan. Valid values: On-demand Execution, Monthly, Weekly, Daily, Hourly, and Custom.
      Update Options The policy for updating the table that stores the collected metadata.
      Delete Options The policy for deleting the table that stores the collected metadata.
    6. Click Next.
    7. In the Confirm Information step, verify that the configuration of the crawler is correct and click OK.
  5. On the OSSMetadata Crawler page, find the created crawler and click Run in the Actions column.
    After the crawler is run, click the number in the Updated Tables in Last Run or Added Tables in Last Run column to view the details about the updated or added tables.
    You can also perform the following operations on the OSSMetadata Crawler page:
    • Click Details in the Actions column of a crawler. In the Crawler Details dialog box, view the detailed information about the crawler.
    • Click Edit in the Actions column of a crawler. In the Edit Crawler dialog box, modify the configuration of the crawler.
    • Click Delete in the Actions column of a crawler. In the Confirm message, click OK to delete the crawler.
    • Click Stop in the Actions column of a running crawler to stop the crawler.
  6. View the metadata collected from the OSS data store.
    1. In the top navigation bar, click All Data.
    2. Click the OSS tab.
    3. On the OSS tab, click the name of the table that stores the collected metadata and view the table details.