This topic describes how to manage an external project by using SQL statements when you use the lakehouse solution of MaxCompute.

Background information

After you create an external project on the Lake and Warehouse Integration (Data Lakehouse) page in the DataWorks console, you can manage the external project by using SQL statements in the following scenarios:

Manage an external project based on DLF and OSS

After you create an external project, you must connect to the external project by using the MaxCompute client and then go to the MaxCompute project to which the external project is mapped. This way, you can manage the data or tables of the external project by using the MaxCompute project.

When you reference a table in an external project, you must specify the table in the external_project_name.table_name format in your SQL statement. You are not allowed to create a table in an external project.
  • Insert data into a table.
    • Insert data into a non-partitioned table.
      -- Go to the MaxCompute project to which the external project is mapped. 
      use doc_test_prod;
      
      -- Before you run jobs, configure the related properties. 
      -- Specify whether to enable the Hive-compatible data type edition. MaxCompute supports Hive syntax only after you enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      
      -- Specify whether to enable the multipart upload feature of Object Storage Service (OSS) to write data to the external tables. 
      set odps.sql.unstructured.oss.commit.mode=true;
      
      -- Use regular expressions to parse data in the external tables and match the files in the directory in which the external tables are stored. Data in the external tables is not read. 
      set odps.sql.unstructured.file.pattern.black.list=.*SUCCESS_$,.*\.hive_staging.*;
      
      -- Use a native Optimized Row Columnar (ORC) reader to improve data processing performance. The ORC reader is compatible with ORC versions later than ORC-135. 
      set odps.ext.oss.orc.native=true;
      -- Use a native Parquet reader to improve data processing performance. 
      set odps.ext.parquet.native=true;
      -- Add this parameter if the OSS file is in the Apache Hudi or Delta Lake format. 
      set odps.sql.split.hive.bridge=true;
      
      -- Insert data into a non-partitioned table. 
      insert into table ext_mcdlf_ddl.ddl_test values(1,"ddl_test_1");
    • Insert data into a partitioned table.
      -- Go to the MaxCompute project to which the external project is mapped. 
      use doc_test_prod;
      
      -- Before you run jobs, configure the related properties. 
      -- Specify whether to enable the Hive-compatible data type edition. MaxCompute supports Hive syntax only after you enable the Hive-compatible data type edition. 
      set odps.sql.hive.compatible=true;
      
      -- Specify whether to enable the multipart upload feature of OSS to write data to the external tables. 
      set odps.sql.unstructured.oss.commit.mode=true;
      
      -- Use regular expressions to parse data in the external tables and match the files in the directory in which the external tables are stored. Data in the external tables is not read. 
      set odps.sql.unstructured.file.pattern.black.list=.*SUCCESS_$,.*\.hive_staging.*;
      
      -- Use a native ORC reader to improve data processing performance. The ORC reader is compatible with ORC versions later than ORC-135. 
      set odps.ext.oss.orc.native=true;
      -- Use a native Parquet reader to improve data processing performance. 
      set odps.ext.parquet.native=true;
      -- Add this parameter if the OSS file is in the Apache Hudi or Delta Lake format. 
      set odps.sql.split.hive.bridge=true;
      
      -- Insert data into a partitioned table. 
      insert overwrite table ext_mcdlf_ddl.range_ptstring_ptint partition (pt1 = 'ds1', pt2=2) values (4, 'val4'), (5, 'val5');
  • View the information about the tables in the external project.
    -- Go to the MaxCompute project to which the external project is mapped. 
    use doc_test_prod;
    
    -- View the tables in the external project. 
    show tables;
    
    -- View the schema of the ext_mcdlf_ddl.ddl_test table. 
    desc extended ext_mcdlf_ddl.ddl_test;
  • Query the table data in the external project.
    -- Go to the MaxCompute project to which the external project is mapped. 
    use doc_test_prod;
    
    -- Before you run jobs, configure the related properties. 
    -- Specify whether to enable the Hive-compatible data type edition. MaxCompute supports Hive syntax only after you enable the Hive-compatible data type edition. 
    set odps.sql.hive.compatible=true;
    
    -- Specify whether to enable the multipart upload feature of OSS to write data to the external tables. 
    set odps.sql.unstructured.oss.commit.mode=true;
    
    -- Use regular expressions to parse data in the external tables and match the files in the directory in which the external tables are stored. Data in the external tables is not read. 
    set odps.sql.unstructured.file.pattern.black.list=.*SUCCESS_$,.*\.hive_staging.*;
    
    -- Use a native ORC reader to improve data processing performance. The ORC reader is compatible with ORC versions later than ORC-135. 
    set odps.ext.oss.orc.native=true;
    
    -- Use a native Parquet reader to improve data processing performance. 
    set odps.ext.parquet.native=true;
    
    -- Add this parameter if the OSS file is in the Apache Hudi or Delta Lake format. 
    set odps.sql.split.hive.bridge=true;
    
    -- Query the data in the non-partitioned table named ext_mcdlf_ddl.ddl_test. 
    select * from ext_mcdlf_ddl.ddl_test limit 10;
    
    -- Query the data in the partitioned table named ext_mcdlf_ddl.range_ptstring_ptint. 
    select * from ext_mcdlf_ddl.range_ptstring_ptint where pt1='ds1';
Note

Manage an external project based on an external Hadoop data source

Execute the following statements to query the data of a Hive table from an external Hadoop data source.
-- Enter the MaxCompute project to which the external project is mapped. You cannot run jobs in an external project. 
use <main_project_name>;

-- Set odps.sql.hive.compatible to true. If you want to access Alibaba Cloud E-MapReduce (EMR) tables, you must configure this property. If you want to access MaxCompute tables, you do not need to configure this property. 
set odps.sql.hive.compatible=true;

-- Optional. Specify the username that is used to access the Hadoop data source. 
set odps.external.hadoop.user=<hadoop_name>;

-- Read data from a Hive table. 
select * from <my_hive_table> limit 10;
Note
  • You cannot execute the DDL statements on tables or data of the external project by using the MaxCompute project to which the external project is mapped. The DDL statements that are used to add partitions to a table and drop partitions from a table are exceptions. For more information about the SQL DDL statements, see Overview of MaxCompute SQL.
  • You can add the SET commands set.odps.sql.hive.compatible=true and set.odps.external.hadoop.user=<hadoop_name> to the odps_config.ini file. This way, you do not need to enter the commands each time you manage the external project.

Grant other users the permissions on an external project

After you create an external project, the tables in the external project are owned by the Alibaba Cloud account that is used to create the external project. If you want to grant other users the permissions on the external project, execute the following statements:
-- Go to the external project.
use <my_external_project>;

-- Before you can use the external project as a RAM user, you must enable RAM support for the external project.
add accountprovider ram;

-- Add a user to the external project. 
add user <Alibaba Cloud account>;

-- Grant the user the permissions to view tables in the external project. 
grant List on project external_project to USER <Alibaba Cloud account>;

-- Grant the user all permissions on a table in the external project. 
grant All on table <table_name> to user <Alibaba Cloud account>;

-- Switch to the project in which jobs are run. 
use <main_project_name>;

-- Add the user to the project. 
add user <Alibaba Cloud account>;

-- Grant the user the permissions to run jobs. 
grant CreateInstance on project <main_project_name> to user <Alibaba Cloud account>;
Note If you want to grant other Alibaba Cloud accounts or RAM users the access permissions on tables in the external project, complete authorization based on the standard security specifications of MaxCompute. For more information, see Permission overview.