MaxCompute provides the lakehouse solution, which enables you to build a data management platform that combines data lakes and data warehouses. This solution integrates the flexibility and diverse ecosystems of data lakes with the enterprise-class deployment of data warehouses. This topic describes how to integrate MaxCompute and data lakes to implement the lakehouse solution.

Background information

MaxCompute can be integrated with Object Storage Service (OSS) or Hadoop Distributed File System (HDFS) to implement the lakehouse solution. The following content describes the implementation in the two scenarios.
  • The lakehouse solution is implemented by integrating MaxCompute with OSS. In this scenario, you must use Data Lake Formation (DLF). All the metadata (schemas) of the data lake is managed in DLF. MaxCompute can use the metadata management capability of DLF to process OSS semi-structured data efficiently. The OSS semi-structured data includes data in the AVRO, CSV, JSON, Parquet, and ORC formats.

    This topic describes how to implement the lakehouse solution in this scenario. To access OSS data, you must create an external project based on a MaxCompute project, DLF metadatabase, DLF metadata table, and OSS directory. You can also perform operations on tables in the external project based on the existing MaxCompute project.

  • The lakehouse solution is implemented by integrating MaxCompute and HDFS. In this process, you must build a data center, deploy virtual machines (VMs) on the cloud, and use E-MapReduce (EMR). In this scenario, you must complete the MaxCompute Lakehouse Application Form. After you submit the application, the MaxCompute technical support personnel contacts and assists you to complete subsequent operations. You do not need to pay attention to the subsequent operations in this topic.
The lakehouse solution of MaxCompute provides the following benefits:
  • MaxCompute is an enterprise-class high-performance data warehouse. It can be used with cloud-native big data services to meet the requirements of high business scalability and agility more easily.
  • MaxCompute can be associated with data lakes by using DLF. This way, you can seamlessly process and analyze data in data lakes without the need to migrate data.
  • After the lakehouse solution is implemented, you can perform data development and data management in the DataWorks console.
This topic describes the procedure to implement the lakehouse solution by integrating MaxCompute with OSS. To implement the lakehouse solution in this scenario, perform the following steps:
  1. Step 1: Obtain the MaxCompute client that supports the lakehouse solution
  2. Step 2: Create a metadatabase and a metadata table in DLF
  3. Step 3: Authorize MaxCompute to access DLF
  4. Step 4: Create an external project
  5. Step 5: Perform operations in the DataWorks console

Prerequisites

Make sure that the following requirements are met:
  • An OSS bucket that uses the Standard storage class is created, and the OSS directory and required data files are obtained.

    For more information about how to create an OSS bucket, how to create an OSS directory, and how to upload data files, see Create buckets, Create directories, and Upload objects.

    In this example, the OSS region is China (Shanghai), the bucket name is mc-dlf-oss, and the OSS directory name is mc-dlf.

    OSS
  • A MaxCompute project is created. This project is not an external project.
    For more information about how to create a MaxCompute project, see Create a MaxCompute project. In this example, the name of the created MaxCompute project is doc_test_prod. MaxCompute project
  • DLF is activated.

Limits

The lakehouse solution of MaxCompute has the following limits:
  • Only the China (Hangzhou), China (Shanghai), and China (Beijing) regions support the lakehouse solution.
  • MaxCompute can be deployed in a different region from OSS or DLF, but OSS and DLF must be deployed in the same region.

Step 1: Obtain the MaxCompute client that supports the lakehouse solution

Download the MaxCompute client that supports the lakehouse solution. Then, install and configure the MaxCompute client.

For more information about how to install and configure the MaxCompute client, see Install and configure the MaxCompute client.

Step 2: Create a metadatabase and a metadata table in DLF

Create a metadatabase and a metadata table in DLF to store the metadata of the OSS data file.

  1. Log on to the DLF console. In the top navigation bar, select the region where OSS is deployed, such as China (Shanghai).
  2. Create a metadatabase, for example, a metadatabase named shanghai.
    1. In the left-side navigation pane, choose Metabase.
    2. On the Metabase page, click Add MetaBase.
    3. In the Create Metabase dialog box, set the parameters for the metadatabase.
      Create Metabase
      Parameter Required Description Example
      Metabase name Yes The name of the metadatabase that you want to create. The name must be 1 to 128 characters in length, and can contain only letters, digits, and underscores (_). It must start with a letter. shanghai
      Metabase description No The description of the metadatabase that you want to create. Associate DLF with OSS to implement the lakehouse solution.
      Select path Yes The OSS directory in which the data file is saved. The directory must be in the oss://<Bucket name>/<OSS directory name>/ format. oss://mc-dlf-oss/mc-dlf/
    4. Click Determine.
      Metabase
  3. Create a metadata table, for example, sh_rt.
    1. On the Metabase page, find the metadatabase and click Table in the Operation column.
    2. On the Meta data table page, click Create table.
    3. On the Create metadata table page, configure the basic information of the metadatabase table.
      Create metadata table
      Parameter Required Description Example
      Metadata table name Yes The name of the metadata table. The name must be 1 to 128 characters in length, and can contain only letters, digits, and underscores (_). It must start with a letter. sh_rt
      Metabase Yes The name of the metadatabase. Select the metadatabase that you created from the drop-down list. shanghai
      Data storage location Yes The OSS directory in which the data file is saved. The directory must be in the oss://<Bucket name>/<OSS directory name>/ format. oss://mc-dlf-oss/mc-dlf/
      Data format Yes The format of the OSS data file. Select a format from the drop-down list. Valid values:
      • AVRO
      • CSV
      • JSON
      • PARQUET
      • ORC
      CSV
      Meta data table description No The description of the metadata table. Associate DLF with OSS to implement the lakehouse solution.
      Input format No Manual configuration is not required. After the Data format parameter is specified, the value of this parameter is automatically assigned.
      • AVRO: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
      • CSV: org.apache.hadoop.mapred.TextInputFormat
      • JSON: org.apache.hadoop.mapred.TextInputFormat
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
      Output format No Manual configuration is not required. After the Data format parameter is specified, the value of this parameter is automatically assigned.
      • AVRO: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
      • CSV: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      • JSON: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
      Serialization method No Manual configuration is not required. After the Data format parameter is specified, the value of this parameter is automatically assigned.
      • AVRO: org.apache.hadoop.hive.serde2.avro.AvroSerDe
      • CSV: org.apache.hadoop.hive.serde2.OpenCSVSerde
      • JSON: org.openx.data.jsonserde.JsonSerDe
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      Splitter No If CSV is selected from the Data format drop-down list, you need to set this parameter based on the delimiter used in the OSS data file. Valid values:
      • Comma (,)
      • Vertical line (|)
      • Semicolon (;)
      • Tab(\u0009)
      • Ctrl-A(\u0001)
      Comma (,)
    4. On the Create metadata table page, click Add column in the Column section, configure the column information of the metadata table, and then click OK.
      Notice The number, data types, and sequence of the columns added must be consistent with those of the columns in the OSS data file. Only one column can be added at a time. If multiple columns exist in the OSS data file, you need to perform this operation several times.
      Add column
      Parameter Required Description Example
      Column number Yes The ID of the column that you want to add. 1
      Column name Yes The name of the column that you want to add. The name must be 1 to 128 characters in length, and can contain only letters, digits, and underscores (_). It must start with a letter. col1
      Data type No The data type of the column. Select a data type from the drop-down list. Make sure that the data type that you selected is the same as that of the column in the OSS data file. string
      Partition field No Specifies whether the column field is a partition field. Valid values:
      • Yes
      • No
      No
      Description No The description of the column that you want to add. First column
  4. Click Determine.
    Metadata table

Step 3: Authorize MaxCompute to access DLF

The account that has operation permissions on the MaxCompute project cannot access DLF without authorization. You can use one of the following methods to perform authorization:
  • Method 1: one-click authorization. We recommend that you use this method when the same account is used to create the MaxCompute project and deploy DLF. You can perform one-click authorization on the Cloud Resource Access Authorization page in the Resource Access Management (RAM) console.
  • Method 2: custom authorization. Use this method when you need to customize the authorization information. This method can be used no matter whether the same account is used to create the MaxCompute project and deploy DLF.

Procedure for custom authorization:

  1. Log on to the RAM console to create a RAM role whose Trusted entity type is Alibaba Cloud Account, for example, AliyunODPSRoleForDLF.
    For more information about RAM roles, see Create a RAM role for a trusted Alibaba Cloud account. Create RAM Role
  2. Modify the trust policy of the created RAM role in the RAM console.
    Notice If the account that is used to create the MaxCompute project is different from the account that is used to deploy DLF, use the account that is used to deploy DLF to perform this step.
    For more information about how to modify the trust policy of a RAM role, see Edit the trust policy of a RAM role.
    The following trust policy is used after modification:
    -- If the same account is used to create the MaxCompute project and deploy DLF, the following trust policy is used: 
    {
    "Statement": [
    {
     "Action": "sts:AssumeRole",
     "Effect": "Allow",
     "Principal": {
       "Service": [
         "odps.aliyuncs.com"
       ]
     }
    }
    ],
    "Version": "1"
    }
    -- If the account that is used to create the MaxCompute project is different from the account that is used to deploy DLF, the following trust policy is used: 
    {
    "Statement": [
    {
     "Action": "sts:AssumeRole",
     "Effect": "Allow",
     "Principal": {
       "Service": [
         "<ID of the Alibaba Cloud account used by the owner of the MaxCompute project>@odps.aliyuncs.com"  
       ]
     }
    }
    ],
    "Version": "1"
    }
    ID of the Alibaba Cloud account used by the owner of the MaxCompute project: You can obtain the ID from the Personal Account page.
  3. Create a custom policy for the created RAM role in the RAM console. When you create a custom policy, make sure that Configuration Mode is set to Script.
    For more information about custom policies, see Create a custom policy.
    The custom policy contains the following content:
    {
    "Version": "1",
    "Statement": [
    {
     "Action": [
       "oss:ListBuckets",
       "oss:GetObject",
       "oss:ListObjects",
       "oss:PutObject",
       "oss:DeleteObject",
       "oss:AbortMultipartUpload",
       "oss:ListParts"
     ],
     "Resource": "*",
     "Effect": "Allow"
    },
    {
     "Action": [
     "dlf:CreateFunction",
    "dlf:BatchGetPartitions",
    "dlf:ListDatabases",
    "dlf:CreateLock",
    "dlf:UpdateFunction",
    "dlf:BatchUpdateTables",
    "dlf:DeleteTableVersion",
    "dlf:UpdatePartitionColumnStatistics",
    "dlf:ListPartitions",
    "dlf:DeletePartitionColumnStatistics",
    "dlf:BatchUpdatePartitions",
    "dlf:GetPartition",
    "dlf:BatchDeleteTableVersions",
    "dlf:ListFunctions",
    "dlf:DeleteTable",
    "dlf:GetTableVersion",
    "dlf:AbortLock",
    "dlf:GetTable",
    "dlf:BatchDeleteTables",
    "dlf:RenameTable",
    "dlf:RefreshLock",
    "dlf:DeletePartition",
    "dlf:UnLock",
    "dlf:GetLock",
    "dlf:GetDatabase",
    "dlf:GetFunction",
    "dlf:BatchCreatePartitions",
    "dlf:ListPartitionNames",
    "dlf:RenamePartition",
    "dlf:CreateTable",
    "dlf:BatchCreateTables",
    "dlf:UpdateTableColumnStatistics",
    "dlf:ListTableNames",
    "dlf:UpdateDatabase",
    "dlf:GetTableColumnStatistics",
    "dlf:ListFunctionNames",
    "dlf:ListPartitionsByFilter",
    "dlf:GetPartitionColumnStatistics",
    "dlf:CreatePartition",
    "dlf:CreateDatabase",
    "dlf:DeleteTableColumnStatistics",
    "dlf:ListTableVersions",
    "dlf:BatchDeletePartitions",
    "dlf:ListCatalogs",
    "dlf:UpdateTable",
    "dlf:ListTables",
    "dlf:DeleteDatabase",
    "dlf:BatchGetTables",
    "dlf:DeleteFunction"
     ],
     "Resource": "*",
     "Effect": "Allow"
    }
    ]
    }
  4. Attach the custom policy to the created RAM role.
    For more information about authorization of RAM roles, see Grant permissions to a RAM role.
  5. On the RAM Roles page, click the name of the RAM role that you created to obtain the Alibaba Cloud Resource Name (ARN).
    ARN

Step 4: Create an external project

You can create an external project based on the created MaxCompute project and DLF metadatabase. This external project is mapped to the MaxCompute project and associates MaxCompute with OSS and DLF. Then, you can use the MaxCompute project to perform operations on the external project. Only the owner of the MaxCompute project or users who have been assigned the Admin or Super_Administrator role can create an external project.

  1. Log on to the MaxCompute client, enter the created MaxCompute project, and then run the create externalproject command to create an external project.
    Syntax of create externalproject:
    create externalproject -source dlf -name <external_project_name> 
        -ref <project_name> 
       [-comment <comment>]
        -region <dlf_region> 
        -db <dlf_database_name>
        -endpoint "<dlf_endpoint>" 
       [-ramRoleArn "<ram_role_arn>"]
        -ossEndpoint "<oss_endpoint>" 
       [-T <table_properties>];
    • external_project_name: required. The name of the external project that you want to create.
    • project_name: required. The name of the MaxCompute project that you created.
    • comment: optional. The comment for the external project.
    • dlf_region: required. The ID of the region where DLF resides. You can view the ID of each region in Obtain the region ID and VPC ID of a VPC.
    • dlf_database_name: required. The name of the DLF metadatabase.
    • dlf_endpoint: required. The endpoint of DLF. Mappings between the regions where lakehouse is supported and DLF endpoints:
      • China (Hangzhou): dlf-share.cn-hangzhou.aliyuncs.com
      • China (Shanghai): dlf-share.cn-shanghai.aliyuncs.com
      • China (Beijing): dlf-share.cn-beijing.aliyuncs.com
    • ram_role_arn: optional. The Alibaba Cloud Resource Name (ARN) of the RAM role. If you have used the custom authorization method to authorize access to DLF, you must specify this parameter. You can also use -D odps.properties.rolearn=<ram_role_arn> to specify this parameter.
      Note If the same account is used to create the MaxCompute project and deploy DLF and the one-click authorization method is used to authorize access to DLF, you do not need to specify this parameter.
    • oss_endpoint: required. The endpoint of OSS. For more information about OSS endpoints, see Regions and endpoints.
    • table_properties: optional. The related properties of tables in the external project. The following table describes the properties that you can configure based on the format of an OSS data file.
      Property Description Valid value
      file_format Specifies the file format in which you want to store data. If you set this property, the settings of input_format and output_format are ignored.
      • SEQUENCEFILE
      • TEXTFILE
      • RCFILE
      • ORC
      • ORCFILE
      • PARQUET
      • AVRO
      input_format Specifies the name of the InputFormat class that is used to store a file.
      • SEQUENCEFILE: org.apache.hadoop.mapred.SequenceFileInputFormat
      • TEXTFILE: org.apache.hadoop.mapred.TextInputFormat
      • RCFILE: org.apache.hadoop.hive.ql.io.RCFileInputFormat
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
      • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
      • AVRO: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
      • JSON: org.apache.hadoop.mapred.TextInputFormat
      • CSV: org.apache.hadoop.mapred.TextInputFormat
      output_format Specifies the name of the OutputFormat class that is used to store a file.
      • SEQUENCEFILE: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      • TEXTFILE: org.apache.hadoop.hive.serde2.OpenCSVSerdeorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      • RCFILE: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
      • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
      • AVRO: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
      • JSON: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      • CSV: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      serialization_lib Specifies the name of the class that is used to serialize and deserialize data.
      • SEQUENCEFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • TEXTFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • RCFILE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      • AVRO: org.apache.hadoop.hive.serde2.avro.AvroSerDe
      • JSON: org.apache.hive.hcatalog.data.JsonSerDe
      • CSV: org.apache.hadoop.hive.serde2.OpenCSVSerde
      serde_properties Specifies properties for serialization_lib. None
    Example:
    create externalproject -source dlf -name ext_mcdlf_ddl
         -ref doc_test_prod
         -comment "DLF"
         -region "cn-shanghai"
         -db shanghai
         -endpoint "dlf-share.cn-shanghai.aliyuncs.com"
         -ossEndpoint "oss-cn-shanghai-internal.aliyuncs.com";
    Note You can run the help external command in the MaxCompute client to view more information about operations on an external project, such as the command syntax and examples of updating and deleting an external project.
  2. Enter the created external project and view tables in the external project.
    Example:
    -- Enter the created external project. 
    odps@ doc_test_prod>use ext_mcdlf_ddl;
    Project timezone: Asia/Shanghai
    -- View tables in the external project. 
    odps@ ext_mcdlf_ddl>show tables;
    
    ALIYUN$xxxxx@test.aliyunid.com:sh_rt
    
    OK
  3. Enter the MaxCompute project to which the external project is mapped and query data in the external project.
    Example:
    -- Enter the MaxCompute project to which the external project is mapped. 
    odps@ ext_dlf_ddl1>use doc_test_prod;
    Project timezone: Asia/Shanghai
    -- Before you query data, configure the following properties. The values are fixed. 
    odps@ doc_test_prod>
    set odps.sql.hive.compatible=true;
    set odps.sql.unstructured.oss.commit.mode=true;
    set odps.sql.unstructured.file.pattern.black.list=r".*/_SUCCESS$,.*/[.]hive-staging/.*";
    set odps.ext.oss.orc.native=true;
    set odps.ext.parquet.native=true;
    -- Query data in a table. 
    odps@ doc_test_prod> select * from ext_mcdlf_ddl.sh_rt limit 10;
    
    +------------+------------+------------+------------+------------+
    | col1       | col2       | col3       | col4       | col5       |
    +------------+------------+------------+------------+------------+
    | 2019/1/1   | 0          | 12TH       | 12TH       | 3          |
    | 2019/1/1   | 0          | 12TH       | 16TH       | 4          |
    | 2019/1/1   | 0          | 12TH       | ANTC       | 1          |
    | 2019/1/1   | 0          | 12TH       | BAYF       | 1          |
    | 2019/1/1   | 0          | 12TH       | CIVC       | 2          |
    | 2019/1/1   | 0          | 12TH       | COLM       | 1          |
    | 2019/1/1   | 0          | 12TH       | COLS       | 1          |
    | 2019/1/1   | 0          | 12TH       | CONC       | 1          |
    | 2019/1/1   | 0          | 12TH       | DALY       | 1          |
    | 2019/1/1   | 0          | 12TH       | DELN       | 2          |
    +------------+------------+------------+------------+------------+
    10 records (at most 10000 supported) fetched by instance tunnel.
  4. In the MaxCompute project, create a table for the external project and insert data into the table.
    Only the parameters in the following syntax are supported. For more information about the parameters, see Table operations.
    create table [if not exists] <table_name>
    [(<col_name> <data_type> [comment <col_comment>], ...)]
    [comment <table_comment>]
    [partitioned by (<col_name> <data_type> [comment <col_comment>], ...)]

    You can insert data into a table for an external project in the same way as you insert data into a table for a common MaxCompute project. For more information, see Insert or update data into a table or static partitions (INSERT INTO and INSERT OVERWRITE).

    Examples:

    • Create a non-partitioned table and insert data into the table.
      -- Before you create a table, configure the following properties. The values are fixed. 
      odps@ doc_test_prod> 
          set odps.sql.hive.compatible=true; --You can use setproject for global setting. 
          set odps.sql.unstructured.oss.commit.mode=true; --You can use setproject for global setting. 
          set odps.sql.unstructured.file.pattern.black.list=r".*/_SUCCESS$,.*/[.]hive-staging/.*"; --You can use setproject for global setting. 
          set odps.ext.oss.orc.native=true;  --You can use setproject for global setting. 
          set odps.sql.hive.compatible=true; --You can use setproject for global setting. 
      
      -- Create a table. 
      odps@ doc_test_prod>create table ext_mcdlf_ddl.ddl_test (id bigint, value string);
      
      -- View the schema of the created table. 
      odps@ doc_test_prod>desc extended ext_mcdlf_ddl.ddl_test;
      +------------------------------------------------------------------------------------+
      | Owner: ALIYUN$xxxxx@test.aliyunid.com | Project: ext_mcdlf_ddl2                    |
      | TableComment:                                                                      |
      +------------------------------------------------------------------------------------+
      | CreateTime:               2021-03-01 10:29:58                                      |
      | LastDDLTime:              2021-03-01 10:29:58                                      |
      | LastModifiedTime:         2021-03-01 10:29:58                                      |
      +------------------------------------------------------------------------------------+
      | ExternalTable: YES                                                                 |
      +------------------------------------------------------------------------------------+
      | Native Columns:                                                                    |
      +------------------------------------------------------------------------------------+
      | Field    | Type   | Label | ExtendedLabel | Nullable | DefaultValue | Comment      |
      +------------------------------------------------------------------------------------+
      | id       | bigint |       |               | true     | NULL         |              |
      | value    | string |       |               | true     | NULL         |              |
      +------------------------------------------------------------------------------------+
      | Extended Info:                                                                     |
      +------------------------------------------------------------------------------------+
      | TableID             : ddl_test                                                     |
      | Location            : oss://mc-dlf-oss/mc-dlf/ddl_test                             |
      | serialization.format: 1                                                            |
      | StoredAs:                 Unknown                                                  |
      | CompressionStrategy:      normal                                                   |
      | odps.timemachine.retention.days: 1                                                        |
      +------------------------------------------------------------------------------------+
      
      --Insert data into the table. 
      odps@ doc_test_prod>insert into table ext_mcdlf_ddl.ddl_test values(1,"ddl_test_1");
      
      -- View data in the table. 
      odps@ doc_test_prod>select * from ext_mcdlf_ddl.ddl_test;
      +------------+------------+
      | id         | value      |
      +------------+------------+
      | 1          | ddl_test_1 |
      +------------+------------+
    • Create a partitioned table and insert data into the table.
      -- Before you create a table, configure the following properties. The values are fixed. 
      odps@ doc_test_prod> 
          set odps.sql.hive.compatible=true; --You can use setproject for global setting. 
          set odps.sql.unstructured.oss.commit.mode=true; --You can use setproject for global setting. 
          set odps.sql.unstructured.file.pattern.black.list=r".*/_SUCCESS$,.*/[.]hive-staging/.*"; --You can use setproject for global setting. 
          set odps.ext.oss.orc.native=true;  --You can use setproject for global setting. 
          set odps.sql.hive.compatible=true; --You can use setproject for global setting. 
      
      -- Create a table. 
      odps@ doc_test_prod>create table ext_mcdlf_ddl.range_ptstring_ptint (id int, name string) partitioned by (pt1 string, pt2 int);
      
      -- View the schema of the created table. 
      odps@ doc_test_prod> desc extended ext_mcdlf_ddl.range_ptstring_ptint;
      +------------------------------------------------------------------------------------+
      | Owner: ALIYUN$santie_doctest@test.aliyunid.com | Project: ext_mcdlf_ddl2                              |
      | TableComment:                                                                      |
      +------------------------------------------------------------------------------------+
      | CreateTime:               2021-03-01 10:42:09                                      |
      | LastDDLTime:              2021-03-01 10:42:09                                      |
      | LastModifiedTime:         2021-03-01 10:42:09                                      |
      +------------------------------------------------------------------------------------+
      | ExternalTable: YES                                                                 |
      +------------------------------------------------------------------------------------+
      | Native Columns:                                                                    |
      +------------------------------------------------------------------------------------+
      | Field    | Type   | Label | ExtendedLabel | Nullable | DefaultValue | Comment      |
      +------------------------------------------------------------------------------------+
      | id       | int    |       |               | true     | NULL         |              |
      | name     | string |       |               | true     | NULL         |              |
      +------------------------------------------------------------------------------------+
      | Partition Columns:                                                                 |
      +------------------------------------------------------------------------------------+
      | pt1             | string     |                                                     |
      | pt2             | int        |                                                     |
      +------------------------------------------------------------------------------------+
      | Extended Info:                                                                     |
      +------------------------------------------------------------------------------------+
      | TableID             : range_ptstring_ptint                                         |
      | Location            : oss://mc-dlf-oss/mc-dlf/range_ptstring_ptint                 |
      | serialization.format: 1                                                            |
      | StoredAs:                 Unknown                                                  |
      | CompressionStrategy:      normal                                                   |
      | odps.timemachine.retention.days: 1                                                        |
      +------------------------------------------------------------------------------------+
      
      --Insert data into the table. 
      odps@ doc_test_prod>insert overwrite table ext_mcdlf_ddl.range_ptstring_ptint partition (pt1 = 'ds1', pt2=2) values (4, 'val4'), (5, 'val5');
      
      -- View data in the table. 
      odps@ doc_test_prod>select * from ext_mcdlf_ddl.range_ptstring_ptint where pt1='ds1';
      +------------+------------+------------+------------+
      | id         | name       | pt1        | pt2        |
      +------------+------------+------------+------------+
      | 4          | val4       | ds1        | 2          |
      | 5          | val5       | ds1        | 2          |
      +------------+------------+------------+------------+

Step 5: Perform operations in the DataWorks console

Perform data development, management, or governance in the DataWorks console based on your business requirements.

For more information about data development and data management, see Data development and Data management.