MaxCompute can query data stored in Object Storage Service (OSS) and managed by Data Lake Formation (DLF) without moving or copying data. You do this by creating a DLF+OSS external schema, which maps a DLF catalog database to a MaxCompute schema. Once mapped, you can use standard SQL to read tables from OSS through the DLF metadata layer.
Use cases
-
Query existing data lake data: Your organization already stores data in OSS and manages metadata in DLF. Add MaxCompute as a query engine without migrating data or duplicating metadata.
-
Build a data lakehouse: Combine MaxCompute's compute capabilities with OSS storage and DLF metadata management to create a unified lakehouse architecture. Selectively copy high-priority data into MaxCompute for performance-critical workloads while keeping the rest in OSS.
How it works
A DLF+OSS external schema maps to your DLF data at the catalog.database level. When you query a table through the external schema, MaxCompute reads metadata from DLF in real time and fetches the corresponding data directly from OSS. You must keep DLF metadata synchronized with the actual data in OSS.
All three services — MaxCompute, DLF, and OSS — must be deployed in the same region. Access to both services is controlled through a single RAM role that you specify when creating the external data source.
Prerequisites
Before you begin, make sure you have:
-
MaxCompute, DLF, and OSS deployed in the same region
-
A MaxCompute project with schema support enabled, or the ability to enable it
-
An Alibaba Cloud account with permissions to create RAM roles and grant policies
-
The MaxCompute client (odpscmd) version 0.46.8 or later, if you plan to run
SHOW TABLES IN <external_schema_name>from the client. RunSHOW VERSION;to check. If you use DataWorks DataStudio, the MaxCompute CMD version in your DataWorks resource group must also be 0.46.8 or later.
Limitations
-
The DLF+OSS external schema feature (data lakehouse 2.0) is available only in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).
-
CREATE TABLE AS SELECTcannot run directly on external schema objects. Switch to the default schema first, then runCREATE TABLE ... AS SELECT * FROM <project>.<external_schema>.<table>.
Set up a DLF+OSS external schema
Step 1: Enable services and grant permissions
-
Grant MaxCompute access to DLF and OSS. By default, the MaxCompute project account has no permissions to access DLF or OSS. Grant permissions using one of the following methods: After completing authorization, record the ARN of the RAM role. You will enter it as the RoleARN in Step 2. To find the ARN: Example ARN:
acs:ram::124****:role/aliyunodpsdefaultrole-
Log on to the Resource Access Management (RAM) console.
-
In the left navigation pane, choose Identities > Roles.
-
Click the role name. The ARN is shown in the Basic Information section.
Method When to use One-click authorization The account that created the MaxCompute project is the same account used to deploy DLF and OSS. Click Authorize DLF and OSS to complete authorization in one step. Custom authorization The accounts differ, or you need fine-grained control. See Custom authorization for DLF. -
Step 2: Create a DLF+OSS external data source
-
Log on to the MaxCompute console and select a region in the upper-left corner.MaxCompute console
-
In the left navigation pane, choose Manage Configurations > External Data Source.
-
On the External Data Source page, click Create External Data Source.
-
In the Create External Data Source dialog box, configure the following parameters:
Parameter Required Description External Data Source Type Required Select DLF+OSS. External Data Source Name Required Enter a custom name. Must start with a letter and contain only lowercase letters, underscores ( _), and digits. Maximum 128 characters. Example:dlf_oss_schema.Description Optional Enter a description. Region Required Defaults to the current region. DLF Endpoint Required Defaults to the DLF endpoint for the current region. OSS Endpoint Required Defaults to the OSS endpoint for the current region. RoleARN Required The ARN of the RAM role from Step 1 that has permissions to access both DLF and OSS. Example: acs:ram::124****:role/aliyunodpsdefaultrole.Foreign Server Supplemental Properties Optional Additional properties for the external data source. After you specify these properties, tasks that use this data source can access the source system based on the defined behavior. NoteFor information about supported parameters, see the official documentation. More parameters will be supported as the product evolves.
-
Click OK. Record the External Data Source Name you entered (for example,
dlf_oss_schema). You will use it as thedlfoss_foreign_server_nameparameter in Step 4.
Step 3: Create a DLF catalog and database
-
Create a DLF data catalog. Record the Catalog ID (for example,
external_schema_dlf_catalog). You will use it as thedlf_catalog_idparameter in Step 4.-
Log on to the Data Lake Formation (DLF) console and select a region in the upper-left corner.Data Lake Formation (DLF) console
-
In the left navigation pane, choose Metadata > Metadata.
-
Click the Catalog List tab, then click New Catalog.
-
On the New Catalog page, set the following:
-
Catalog ID: A unique identifier for this catalog. Example:
external_schema_dlf_catalog. -
Location: The OSS bucket path where catalog data is stored. Example:
oss://external-schema/dlf-oss.
-
-
-
Create a database in the catalog. Record the Database Name (for example,
dlf_oss_db). You will use it as thedatabase_nameparameter in Step 4.-
On the Metadata page, click the Database tab.
-
In the Catalog List section, click Create Database and configure the following parameters:
Parameter
Required
Description
Catalog
Required
Select the catalog you created in the previous step. Example:
external_schema_dlf_catalog.Database Name
Required
1–128 characters, starting with a letter. Can contain letters, digits, and underscores. Example:
dlf_oss_db.Database Description
Optional
Enter a description.
Select Path
Required
The OSS path for this database. Example:
oss://external-schema/dlf-oss/database.
-
Step 4: Create an external schema
-
Enable schema support for your MaxCompute project. External schema operations require a MaxCompute project with schema support enabled.
-
In the MaxCompute console, choose Manage Configurations > Projects.
-
Find your target project and click Enable Schema in the Actions column.
-
-
Before running SQL operations on an external schema, add the following statement before your SQL:
SET odps.namespace.schema=true; -
In the MaxCompute client, run the following command to create the external schema. Use the values you recorded in Steps 2 and 3:
-
dlfoss_foreign_server_name: the external data source name from Step 2 (for example,dlf_oss_schema) -
dlf_catalog_id: the DLF catalog ID from Step 3 (for example,external_schema_dlf_catalog) -
database_name: the DLF database name from Step 3 (for example,dlf_oss_db)
CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name> WITH <dlfoss_foreign_server_name> ON '<dlf_catalog_id>.<database_name>';Example:
CREATE EXTERNAL SCHEMA IF NOT EXISTS test_schema WITH dlf_oss_schema ON 'external_schema_dlf_catalog.dlf_oss_db'; -
Step 5: Query data from the external schema
List tables and query data using standard SQL. Schema operations such as listing and deleting follow the same commands as internal schemas. For a full list of schema operations, see Schema operations.
List tables in the external schema
-- Method 1: List tables directly
SHOW TABLES IN test_schema;
-- Method 2: Switch to the schema first, then list tables
USE SCHEMA test_schema;
SHOW TABLES;
Sample output:
ALIYUN$xxx@test.aliyunid.com:hellocsv
ALIYUN$xxx@test.aliyunid.com:t1
ALIYUN$xxx@test.aliyunid.com:t2
ALIYUN$xxx@test.aliyunid.com:t3
Query a table
SELECT * FROM test_project.test_schema.hellocsv;
Copy external data into MaxCompute
CREATE TABLE AS SELECT cannot run directly on external schema objects. Switch to the default schema first:
-- Copy data from the external schema into a MaxCompute table
CREATE TABLE hellocsv_copy AS SELECT * FROM test_project.test_schema.hellocsv;
-- Query the copied data
SELECT * FROM hellocsv_copy;
Manage external data sources
View all external data sources in the MaxCompute console under Manage Configurations > External Data Source.
| Column | Description |
|---|---|
| Data source name | The name of the external data source. |
| Type | The type of the external data source. Supported types: DLF+OSS, Paimon_DLF, Hive+HDFS, Filesystem Catalog, and Hologres. |
| Network connection | The network connection name, if configured. Only Hive+HDFS supports network connections. |
| Owner account | The account that created the external data source. All access to the external system uses the RAM role specified by this account. The creator controls which projects can use the data source through mount relationships, and project administrators grant users access through policies. If project-level tenant resource access control is disabled, any user who creates an external project or schema can use this data source. |
| Mounted projects | The number of projects to which the data source is mounted. If a project has project-level tenant resource access control enabled, the tenant administrator must mount the data source to the project before users can access it. |
| Creation time | When the external data source was created. |
| Update time | When the external data source's editable properties were last modified. |
| Actions > Details | View the data source properties. Click Edit to modify its basic information. Changing permissions may invalidate existing authorizations by altering the scope of visible objects. |
| Actions > Delete | Delete the external data source. Important
Deleting a data source removes all mount relationships and causes dependent tasks to lose access to the external system. |
What's next
-
Schema operations — list, view, and delete schemas using the same commands as internal schemas.