After creating an external project in the data lakehouse setup, use SQL statements to query and modify data stored in Data Lake Formation (DLF) and Object Storage Service (OSS), or in an external Hadoop cluster — without leaving the MaxCompute SQL environment.
Prerequisites
Before you begin, ensure that you have:
An external project created on the Lake and Warehouse Integration (Data Lakehouse) page in the DataWorks console
Access to the MaxCompute client
Run all SQL statements in the MaxCompute project that the external project is mapped to — not in the external project itself. You cannot run jobs directly in an external project.
When referencing a table in an external project, use the external_project_name.table_name format.
Persist session parameters
Many operations require SET commands. To avoid repeating them every session, add them to the odps_config.ini file in your MaxCompute client installation directory. Parameters in that file are applied automatically at startup.
Manage an external project based on DLF and OSS
You cannot create a table in an external project. All table creation must be done in the mapped MaxCompute project.
Insert data
Non-partitioned table:
-- Switch to the MaxCompute project mapped to the external project.
use doc_test_prod;
-- Insert a row into a non-partitioned table.
insert into table ext_mcdlf_ddl.ddl_test values(1,"ddl_test_1");Partitioned table:
-- Switch to the MaxCompute project mapped to the external project.
use doc_test_prod;
-- Insert rows into a specific partition.
insert overwrite table ext_mcdlf_ddl.range_ptstring_ptint partition (pt1 = 'ds1', pt2=2) values (4, 'val4'), (5, 'val5');View table information
-- Switch to the MaxCompute project mapped to the external project.
use doc_test_prod;
-- List all tables in the external project.
show tables in doc_test_prod;
-- View the schema of a specific table.
desc extended ext_mcdlf_ddl.ddl_test;Query table data
-- Switch to the MaxCompute project mapped to the external project.
use doc_test_prod;
-- Query a non-partitioned table.
select * from ext_mcdlf_ddl.ddl_test limit 10;
-- Query a partitioned table with a partition filter.
select * from ext_mcdlf_ddl.range_ptstring_ptint where pt1='ds1';Special parameters for DLF + OSS
Some scenarios require additional session parameters before running SQL statements:
-- Required if the Optimized Row Columnar (ORC) version is later than ORC-135.
set odps.ext.oss.orc.native=true;
-- Required to enable Hive-compatible data types.
-- MaxCompute supports Hive syntax only after you enable this setting.
set odps.sql.hive.compatible=true;
-- Enable multipart upload for the current session.
set odps.sql.unstructured.oss.commit.mode=true;
-- Enable multipart upload for the entire project.
setproject odps.sql.unstructured.oss.commit.mode=true;For more information, see Table operations, INSERT INTO and INSERT OVERWRITE, and Multipart upload.
Manage an external project based on an external Hadoop data source
Use the following statements to query Hive table data from an external Hadoop data source.
-- Switch to the MaxCompute project mapped to the external project.
use <main_project_name>;
-- Enable Hive-compatible mode.
-- Required for E-MapReduce (EMR) tables; not required for MaxCompute tables.
set odps.sql.hive.compatible=true;
-- Optional: specify the username used to access the Hadoop data source.
set odps.external.hadoop.user=<hadoop_name>;
-- Query data from the Hive table.
select * from <my_hive_table> limit 10;DDL statements on external Hadoop tables are not supported, with two exceptions: adding partitions and dropping partitions. For the full list of supported DDL syntax, see Overview of MaxCompute SQL.
Create a table with the same schema as an external table
Use CREATE TABLE...LIKE to create a MaxCompute table with the same schema as a table in the external data source. The new table copies the column names, column comments, and table comments from the source table, but not the lifecycle setting or the data.
-- Create a MaxCompute table with the same schema as external_project.testtbl.
create table from_exe_tbl like external_project.testtbl;
-- Verify the schema was copied correctly.
desc from_exe_tbl;
-- Query the new table (it starts empty; data is not copied from the source).
SELECT * from from_exe_tbl;For more information about CREATE TABLE...LIKE syntax and options, see Table operations.
What's next
To grant other users access to an external project or its tables, see Grant other users the permissions on an external project.