This topic describes how to use the metadata discovery feature of Data Lake Analytics (DLA) to query and analyze data stored in Object Storage Service (OSS). This helps you understand the basic procedure of using DLA.

Prerequisites

An Alibaba Cloud account is created and has passed real-name verification.
Note If you do not have an Alibaba Cloud account, the system prompts you to create one when you activate DLA.

Procedure

  1. Activate DLA.
  2. Log on to the OSS console and upload a file to OSS. For more information, see Upload objects.
    For example, you can upload the supplier_with_header.csv file to the oss://alibaba-crawler/supply-ceshi/ directory in OSS.
  3. Log on to the DLA console. In the left-side navigation pane, choose Data Lake Management > Meta information discovery.
  4. On the Meta information discovery page, click Go To The Wizard in the OSS data source section.
  5. On the OSS data source tab, select a bucket from the Bucket drop-down list and select the destination object and file in the bucket.
    Note The system automatically obtains the OSS buckets that are in the same region as DLA. You can select a bucket from the Bucket drop-down list based on your business requirements. After you select a bucket, the system automatically lists all objects and files in this bucket. After you select the destination object and file, the system automatically adds them to the OSS directory on the right.
  6. In the Data source configuration, Scheduling configuration, and Target metadata configuration sections, specify the parameters.
    The following table describes these parameters.
    Parameter Description
    OSS path The OSS directory in which the destination file is stored. It ends with a forward slash (/). The system automatically specifies this parameter based on the directory where you save the selected folder.
    Format parser The default value of this parameter is Automatic parsing. This indicates that all built-in parsers are called in sequence. To specify the parser of a specific file format, you can also set this parameter to json, parquet, avro, orc, or csv.
    Scheduling frequency The frequency at which metadata discovery tasks are scheduled.
    Schema Name The name of the schema, which the name of the database that is mapped to DLA. After a metadata discovery task is created, a schema is automatically created for this task.
    Configuration options (optional) The advanced custom options, such as File field change rules and Object deletion change rules.
  7. After you specify the parameters, click Create.
    After the metadata discovery task is created, you can view the created task on the Task List tab. The task is manually or periodically scheduled based on the value of Scheduling frequency you specified.
    After the metadata discovery task is completed, find your task on the Task List tab and click the database name link, such as alibaba, in the schema name/prefix column to go to the Execute page. You can then view the created databases, tables, and columns that are automatically discovered by DLA.
  8. On the Execute page, edit SQL statements in the code editor and click Sync Execute(F8) or Async Execute(F9) to execute SQL statements.
    For example, you can execute select * from `alibaba`.`supply_ceshi` limit 20; for the alibaba database.
    In the lower part of the Execute page, you can click Result Set to view the metadata that DLA automatically discovers from the supplier_with_header.csv file in the oss://alibaba-crawler/supply-ceshi/ directory.