All Products
Search
Document Center

MaxCompute:Read CSV data from a data lake using DLF 1.0 and OSS

Last Updated:Mar 25, 2026

Query CSV files stored in Object Storage Service (OSS) directly from MaxCompute — without loading data first. Data Lake Formation (DLF) extracts metadata from your OSS directories and maps them to a MaxCompute external schema, letting you run standard SQL queries against the data lake.

Prerequisites

Before you begin, ensure that you have:

Supported regions:

RegionRegion ID
China (Hangzhou)cn-hangzhou
China (Shanghai)cn-shanghai
China (Beijing)cn-beijing
China (Zhangjiakou)cn-zhangjiakou
China (Shenzhen)cn-shenzhen
China (Hong Kong)cn-hongkong
Singaporeap-southeast-1
Germany (Frankfurt)eu-central-1

Overview

The setup follows four stages:

  1. Authorize — grant MaxCompute permissions to access DLF and OSS.

  2. Prepare data — upload CSV files to OSS in a partitioned directory structure.

  3. Extract metadata — use DLF to scan OSS directories and create table definitions automatically.

  4. Connect and query — create an external data source and external schema in MaxCompute, then run SQL against the OSS data.

Step 1: Grant MaxCompute access to DLF and OSS

The account associated with your MaxCompute project needs permissions to access both DLF and OSS. Two authorization methods are available:

  • One-click authorization — if you use the same Alibaba Cloud account for MaxCompute and DLF, click Authorize DLF to complete authorization in one step.

  • Custom authorization — if you use different accounts, follow the custom authorization guide.

Step 2: Prepare the OSS data

This example uses a partitioned table where the partition key is direction. Each partition contains a CSV file with vehicle tracking data.

  1. Log on to the OSS console and create a bucket. This example uses a bucket named mc-lakehouse-dlf-oss.

  2. Create a root folder named mc_dlf_oss_pt/ in the bucket, then create four subdirectories and upload the corresponding files: DLF uses the root folder name (mc_dlf_oss_pt) as the table name and each direction=<value> subdirectory as a partition.

    Directory and table names can contain only letters, digits, and underscores, and must start with a letter.
    SubdirectoryFile
    direction=N/vehicle1.csv
    direction=NE/vehicle2.csv
    direction=S/vehicle3.csv
    direction=SW/vehicle4.csv

Step 3: Extract OSS metadata using DLF 1.0

Create a DLF database

  1. Log on to the Data Lake Formation (DLF) console and select your region.Data Lake Formation (DLF) console

  2. In the navigation pane, choose Metadata > Metadata.

  3. On the Metadata page, click the Database tab. In the default catalog, click Create Database and configure the following parameters:

    ParameterRequiredDescription
    CatalogYesSelect default.
    Database NameYesEnter a name that starts with a letter, is 1–128 characters long, and contains only letters, digits, and underscores. Example: dlf_oss_csv.
    Database DescriptionNoEnter an optional description.
    Select PathYesThe OSS path where database metadata is stored. Example: oss://mc-lakehouse-dlf-oss/metadb_save.

Create a metadata extraction task

  1. In the navigation pane, choose Metadata > Metadata Discovery.

  2. Click Create Extraction Task and configure the task in three sections:

    Set Extraction Source:

    ParameterRequiredDescription
    Extraction Task NameYesEnter a custom name for the task.
    Select OSS PathYesThe OSS path to scan. DLF creates tables and partitions based on the directory structure: oss://<path>/<table-name>/<partition-name>/<file-name>. For this example, select oss://mc-lakehouse-dlf-oss/mc_dlf_oss_pt/.
    Exclusion ModeNoA regular expression matching OSS directories to exclude from scanning.
    Parse FormatYesThe file format. Options: Automatic Recognition, json, csv, parquet, orc, hudi, delta, avro. Select csv for this example.

    Set Extraction Target:

    ParameterRequiredDescription
    CatalogYesSelect the target data catalog.
    Destination DatabaseYesSelect the database created in the previous step. Example: dlf_oss_csv.
    Destination Table PrefixNoA prefix for extracted table names. Must be 1–128 characters and start with a letter.
    Method to Handle Table Field UpdateYesHow to handle schema changes when re-running the task: Add Columns and Retain Existing Columns (safe default), Update Table Schema and Generate Table Results Based on the last detected table schema (overwrites existing tables with the same name — this may cause the original tables to be lost or become unavailable), or Ignore Updates and Not Modify Table.
    Method to Process Deleted OSS ObjectsYesDelete Metadata removes table definitions when OSS objects are deleted. Ignore Updates and Not Delete Tables keeps existing metadata.

    Set Extraction Task:

    ParameterRequiredDescription
    RAM RoleYesThe RAM role DLF assumes to access your OSS data. Select AliyunDLFWorkFlowDefaultRole to use the default system role.
    Execution PolicyNoManual (run on demand) or Scheduling (run on a schedule).
    Extraction PolicyYesPartial Data Extraction scans part of each file — faster but may produce less accurate schemas. Extract All scans every file for maximum accuracy.
  3. Click Save and Execute Immediately.

View extracted tables

  1. In the navigation pane, choose Metadata > Metadata.

  2. Click the Database tab, then click the name of your database (dlf_oss_csv).

  3. On the Table List tab, confirm that the extracted table mc_dlf_oss_pt appears.

If you add data to a new partition later, re-run the extraction task to update the metadata.

Step 4: Create a DLF 1.0 and OSS external data source

  1. Log on to the MaxCompute console and select your region.MaxCompute console

  2. In the navigation pane, choose Manage Configurations > External Data Source.

  3. Click Create External Data Source and configure the following parameters:

    ParameterRequiredDescription
    External Data Source TypeYesSelect DLF+OSS.
    External Data Source NameYesA name that starts with a letter, contains only lowercase letters, underscores, and digits, and is no more than 128 characters. Example: dlf_oss_csv_new.
    DescriptionNoAn optional description.
    RegionYesDefaults to the current region.
    DLF EndpointYesDefaults to the DLF endpoint for the current region.
    OSS EndpointYesDefaults to the OSS endpoint for the current region.
    RoleARNYesThe Alibaba Cloud Resource Name (ARN) of the RAM role that has permissions to access both DLF and OSS. To find the ARN, go to the RAM console, choose Identities > Roles, and copy the ARN from the Basic Information section. Example: acs:ram::124****:role/aliyunodpsdefaultrole.
    Foreign Server Supplemental PropertiesNoSpecify additional attributes for the external data source. After you specify the attributes, tasks that use this data source can access the source system based on the defined behavior. For information about supported parameters, see future updates in the official documentation. More parameters will be available as the product evolves.
  4. Click OK.

  5. On the External Data Source page, find the new data source and click Details in the Actions column to verify the configuration.

Step 5: Create an external schema

Connect to MaxCompute and run the following SQL:

SET odps.namespace.schema=true;

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema>
WITH <external_data_source>
ON '<dlf_data_catalogue>.dlf_database';

Replace the placeholders with your values:

PlaceholderDescriptionExample
<external_schema>A name for the external schemaes_dlf_oss_csv
<external_data_source>The external data source created in Step 4. Must be in the same region as the MaxCompute project.dlf_oss_csv_new
<dlf_data_catalogue>The DLF data catalog ID. See Create a data catalog.122****
dlf_databaseThe database name in the DLF data catalog. See Databases, tables, and functions.dlf_oss_csv

Step 6: Query OSS data with SQL

  1. List the tables available in the external schema:

    SET odps.namespace.schema=true;
    USE schema es_dlf_oss_csv;
    SHOW tables IN es_dlf_oss_csv;

    Expected output:

    ALIYUN$xxx:mc_dlf_oss_pt
    
    OK
  2. Query data from a specific partition:

    SET odps.namespace.schema=true;
    SELECT * FROM <project_name>.es_dlf_oss_csv.mc_dlf_oss_pt WHERE direction='NE';

    Expected output:

    +------------+------------+------------+------------+------------+------------+----------------+------------+
    | _c0        | _c1        | _c2        | _c3        | _c4        | _c5        | _c6            | direction  |
    +------------+------------+------------+------------+------------+------------+----------------+------------+
    | 1          | 2          | 13         | 1          | 46.81006   | -92.08174  | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006   | -92.08174  | 9/14/2014 0:00 | NE         |
    | 1          | 9          | 4          | 1          | 46.81006   | -92.08174  | 9/15/2014 0:00 | NE         |
    +------------+------------+------------+------------+------------+------------+----------------+------------+

Step 7: Copy data to a MaxCompute table

For better query performance on frequently accessed data, copy it to a MaxCompute internal table using CREATE TABLE AS SELECT (CTAS). Internal tables do not scan OSS files at query time.

  1. Copy a partition to an internal table:

    CREATE TABLE vehicle_copy AS
    SELECT * FROM <project_name>.es_dlf_oss_csv.mc_dlf_oss_pt
    WHERE direction='NE';
  2. Query the internal table to verify the data:

    SELECT * FROM vehicle_copy;

    Expected output:

    +------------+------------+------------+------------+------------+------------+----------------+------------+
    | _c0        | _c1        | _c2        | _c3        | _c4        | _c5        | _c6            | direction  |
    +------------+------------+------------+------------+------------+------------+----------------+------------+
    | 1          | 2          | 13         | 1          | 46.81006   | -92.08174  | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006   | -92.08174  | 9/14/2014 0:00 | NE         |
    | 1          | 9          | 4          | 1          | 46.81006   | -92.08174  | 9/15/2014 0:00 | NE         |
    +------------+------------+------------+------------+------------+------------+----------------+------------+

Next steps