This topic describes how to create a crawler to collect metadata from an Object Storage Service (OSS) data store to DataWorks. You can view collected metadata on the Data Map page.

Procedure

  1. Go to the Data Discovery page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces. The Workspaces page appears.
    3. Find the target workspace and click Data Analytics in the Actions column.
    4. On the DataStudio page, click Icon in the upper-left corner and choose All Products > DataMap. The Data Map page appears.
    5. Click Data Discovery in the top navigation bar.
  2. In the left-side navigation pane, click OSS.
  3. On the OSSMetadata Crawler page that appears, click Create Crawler.
  4. In the Create Crawler dialog box that appears, follow these steps:
    1. In the Basic Information step, set basic parameters.
      Create Crawler
      Parameter Description
      Crawler Name Required. The name of the crawler. You must specify a unique name.
      Crawler Description The description of the crawler.
      Connect To The type of the data store from which metadata will be collected. The default value is OSS and cannot be changed.
    2. Click Next.
    3. In the Select object type step, set parameters of the object from which metadata will be collected.
      Select object type
      Parameter Description
      Connection Select an OSS connection from the Connection drop-down list. If the required connection does not exist, go to the Data Source page in Workspace Management and create the connection. For more information, see Configure an OSS connection.
      Object Path Select the path of the OSS object from which metadata will be collected.
      Path Traversal Specify whether to traverse sub-paths in the specified path.
      Prefix Specify the prefix of the names of tables that the crawler automatically generates. By default, a generated table is named after the corresponding OSS object.
    4. Click Next.
    5. In the Configure Execution Plan step, set scheduling parameters.
      Configure Execution Plan
      Parameter Description
      Execution Plan Specify an execution plan. The valid values of Execution Plan are as follows: On-demand Execution, Monthly, Weekly, Daily, Hourly, and Custom.
      Option Specify the policy for updating the target table.
      Option Specify the policy for deleting the target table.
    6. Click Next.
    7. In the Confirm Information step, verify that the configuration of the crawler is correct and click Confirm.
  5. On the OSSMetadata Crawler page, find the created crawler and click Run in the Actions column.
    After the crawler is run, click the number in the Last run update table or Last run Add table column to view details about the updated or added tables.
    You can also perform the following operations on the page:
    • Click Details in the Actions column of a crawler. In the Crawler Details dialog box that appears, view the detailed information about the crawler.
    • Click Edit in the Actions column of a crawler. In the Edit Crawler dialog box that appears, modify the configuration of the crawler.
    • Click Delete in the Actions column of a crawler. In the Confirm dialog box that appears, click OK to delete the crawler.
    • Click Stop in the Actions column of a running crawler to stop the crawler.
  6. View metadata collected from the OSS data store.
    1. Click All Data in the top navigation bar.
    2. Click the OSS tab.
    3. On the OSS tab, click the corresponding table name and view the table details.