This topic describes how to create a crawler to collect metadata from an Object Storage Service (OSS) data source. You can view the collected metadata on the Data Map page.

Procedure

  1. Go to the Data Discovery page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
    4. On the DataStudio page, click the More icon icon in the upper-left corner and choose All Products > Data governance > DataMap.
    5. In the top navigation bar, click Data Discovery.
  2. In the left-side navigation pane, click OSS.
  3. On the OSSMetadata Crawler page, click Create Crawler.
  4. In the Create Crawler dialog box, set the parameters in each step.
    1. In the Basic Information step, set the parameters as required.
      Create Crawler
      Parameter Description
      Crawler Name Required. The name of the crawler. You must set a unique name.
      Crawler Description The description of the crawler.
      Data Source Type The type of the data source from which you want to collect metadata. The default value is OSS and cannot be changed.
    2. Click Next.
    3. In the Select Collection Object step, set the parameters to specify the data source.
      Select Collection Object
      Parameter Description
      Workspace The workspace of the OSS data source from which you want to collect metadata.
      Data Source The OSS data source from which you want to collect metadata. If no data source is available, go to the Data Source page and add an OSS data source. For more information, see Add an OSS data source.
      Object Path The path of the OSS object from which you want to collect metadata.
      Path Traversal Specifies whether to traverse sub-paths in the specified path.
      Prefix The prefix of the names of tables that the crawler automatically generates. By default, a generated table is named after the corresponding OSS object.
    4. Click Next.
    5. In the Configure Execution Plan step, configure an execution plan.
      Configure Execution Plan
      Parameter Description
      Execution Plan The execution plan of the crawler. Valid values: On-demand Execution, Monthly, Weekly, Daily, Hourly, and Customize.
      Update Options The policy that is used to update the tables that store the collected metadata.
      Delete Options The policy that is used to delete the tables that store the collected metadata.
    6. Click Next.
    7. In the Confirm Information step, check the information that you specified and click Confirm.
  5. On the OSSMetadata Crawler page, find the created crawler and click Run in the Actions column.
    After the crawler is run, click the number in the Updated Tables in Last Run or Added Tables in Last Run column to view the details of the updated or created tables.
    You can also perform the following operations on the OSSMetadata Crawler page:
    • Click Details in the Actions column of a crawler. In the Crawler Details dialog box, view the detailed information about the crawler.
    • Click Edit in the Actions column of a crawler. In the Edit Crawler dialog box, modify the configurations of the crawler.
    • Click Delete in the Actions column of a crawler. In the Confirm message, click Ok to delete the crawler.
    • Click Stop in the Actions column of a crawler that is running to stop the crawler.
  6. View the metadata collected from the OSS data source.
    1. In the top navigation bar, click All Data.
    2. Select OSS from the drop-down list in the upper part of the page.
    3. Click the name of a table that stores the collected metadata and view the table details.