All Products
Search
Document Center

E-MapReduce:DuckDB SQL development

Last Updated:Jun 22, 2026

EMR Serverless Spark integrates DuckDB for lightweight, efficient data analysis. You can use DuckDB SQL tasks to explore data, develop prototypes, and run ad-hoc queries.

Prerequisites

  • You have created an EMR Serverless Spark workspace. For more information, see Create a workspace.

  • You have created and started a DuckDB session. For more information, see DuckDB sessions.

Limitations

  • You can only read data from OSS files and a DLF Catalog.

  • DuckDB access to a DLF Catalog has the following limitations:

    • Operation type: You can only read Paimon append-only tables. Reading primary key tables and performing write operations are not supported.

    • Table format: Only Paimon tables with file.format = 'parquet' are supported.

    • Catalog type: You cannot access a DLF Catalog created with data sharing.

Create a DuckDB SQL task

  1. Navigate to the data development page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the target workspace.

    4. In the left-side navigation pane, click Data Development.

  2. Create a task.

    1. On the Data Development page, click Create Job.

    2. In the Create Job dialog box, select Interactive Development > DuckDB SQL.

    3. Enter a name for the task and click OK.

    4. In the upper-right corner, select a DuckDB session.

    5. Enter SQL statements in the task editor.

Read data from OSS files

Note

Before you read from or write to OSS files, ensure that you have activated OSS, created a bucket, and granted the execution role of your EMR Serverless Spark workspace permission to access the bucket.

  • View the content of Parquet, CSV, and JSON files.

    -- View the content of a Parquet file
    SELECT * FROM read_parquet('oss://bucket/path/to/test.parquet');
    -- View the content of a CSV file
    SELECT * FROM read_csv('oss://bucket/path/to/test.csv');
    -- View the content of a JSON file
    SELECT * FROM read_json('oss://bucket/path/to/test.json');
  • View the schema of a Parquet file.

    SELECT * FROM parquet_schema('oss://your-bucket/path/to/test.parquet');
  • Export query results to a CSV file.

    COPY (SELECT * FROM read_parquet('oss://your-bucket/path/to/test.parquet'))
    TO 'oss://your-bucket/path/to/output.csv'
    (HEADER, DELIMITER ',');

Access a DLF Catalog

Note

DuckDB access to a DLF Catalog has the following limitations:

  • Operation type: You can only read Paimon append-only tables. Reading primary key tables and performing write operations are not supported.

  • Table format: Only Paimon tables with file.format = 'parquet' are supported.

  • Catalog type: You cannot access a DLF Catalog created with data sharing.

  • View the DLF Catalogs bound to the current workspace. For more information, see Manage data catalogs.

    SHOW DATABASES;
  • Query table data in a DLF Catalog.

    USE <catalogName>.<databaseName>;
    SELECT * FROM <tableName> LIMIT 10;