This topic describes how to create a metadata discovery task by using the wizard. This task automatically discovers log data that is shipped from Logstores of Log Service to Object Storage Service (OSS). All the Logstores are deployed in the region where Data Lake Analytics (DLA) resides. This task also automatically creates databases and tables in DLA to map them to the log data that is shipped to OSS. The log data includes new log data shipped to OSS and the added partitions and their data.

Prerequisites

Log data is shipped from Log Service in the region where DLA resides to OSS. For more information, see Ship log data from Log Service to OSS.

Scenarios

Enterprises store logs such as service logs and behavioral logs in Log Service. If the amount of data is large, you can use Log Service to ship all the data to OSS. Before the metadata discovery feature of DLA is enabled, such data cannot be analyzed or computed. After this feature is enabled, you can generate metadata of DLA with one click and use the unified data analytics platform of DLA to analyze data. The serverless Spark and Presto engines of DLA can calculate and analyze the global metadata managed by DLA. In addition, DLA supports business scenarios such as data delivery after extract, transform, load (ETL) operations, low-frequency full log data analysis, and association analysis of log data and database data.

Procedure

  1. Log on to the DLA console.
  2. In the left-side navigation pane, choose Data Lake Management > Meta information discovery.
  3. On the Meta information discovery page, click Go To The Wizard in the SLS data source for SLS section.
    SLS data source for SLS
  4. On the page that appears, click the SLS data source for SLS tab and configure parameters, as shown in the following figure.
    Create a log shipping task
    Parameter Description
    Data source configuration Valid values:
    • Automatic discovery: DLA automatically discovers all log data that is shipped to OSS. Manual data configurations are not required. If new log data is shipped, DLA automatically discovers the data the next time a metadata discovery task is triggered.
    • Manual selection: You must manually select Logstores that are used to ship data to OSS.
    Scheduling frequency The frequency at which Logstores of Log Service ship log data to OSS.
    Specific time The time a metadata discovery task is scheduled to discover the log data that is shipped to OSS.
    schema prefix The prefix of a schema, which indicates the name prefix of the DLA database that is mapped to an OSS database. A schema is named in the format of Prefix__Name. Name indicates the name of the OSS bucket where data shipped from a Logstore is saved.
    Configuration options (optional) The advanced custom options, such as File field change rules.
  5. After you configure the preceding parameters, click Create to create a task for discovering log data that is shipped from Log Service to OSS.
  6. After the task is created, click Immediately discovered to start the task.
    Task created
    You can also view the created task on the Task List tab. The task is manually or periodically scheduled based on the value of Scheduling frequency that you specified.
    After the task succeeds, click the Task List tab on the Meta information discovery page. Then, find your task in the task list and click the link, such as muyuantestonline, in the schema name/prefix column. On the Execute page, you can view the databases, tables, and columns that are automatically discovered by DLA.
  7. On the Execute page, edit SQL statements in the code editor and click Sync Execute(F8) or Async Execute(F9) to execute SQL statements.
    For example, you can execute select * from `muyuantestonline__dla_crawler_hangzhou`.`sls_crawler_test__csv_full_types` limit 20; under muyuantestonline__dla_crawler_hangzhou.