All Products
Search
Document Center

MaxCompute:CREATE TABLE

Last Updated:Jan 23, 2025

This topic describes how to create a non-partitioned table, a partitioned table, an external table, or a clustered table.

Limits

  • A partitioned table can have a maximum of six levels of partitions. For example, if a table uses date columns as partition key columns, the six levels of the partitions are year/month/week/day/hour/minute.

  • By default, a table can have a maximum of 60,000 partitions. You can adjust the maximum number of partitions in a table based on your business requirements.

For more information about the limits on tables, see MaxCompute SQL limits.

Syntax

Create an internal table

CREATE [OR REPLACE] TABLE [IF NOT EXISTS] <table_name> (
<col_name> <data_type>, ... )
[comment <table_comment>]
[PARTITIONED BY (<col_name> <data_type> [comment <col_comment>], ...)];

Create a clustered table

CREATE TABLE [IF NOT EXISTS] <table_name> (
<col_name> <data_type>, ... )
[CLUSTERED BY | RANGE CLUSTERED BY (<col_name> [, <col_name>, ...]) 
[SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])] 
INTO <number_of_buckets> buckets];

Create an external table

For example, create an OSS external table using the built-in text data resolver.

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( 
<col_name> <data_type>, ... ) 
STORED AS '<file_format>' 
[WITH SERDEPROPERTIES (options)]  
LOCATION '<oss_location>';

Create a table and specify the table type

  • Designate the table as transactional to enable updates or deletions post-creation, subject to certain restrictions. Consider your business requirements when creating a transactional table.

    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <table_name> (
    <col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...   
    [PRIMARY KEY (<pk_col_name>[, <pk_col_name2>, ...] )])
    [comment <table_comment>]
    [TBLPROPERTIES ("transactional"="true")];
  • Designate the table as a Delta table to utilize primary key operations, such as upserts, incremental queries, and time travel queries.

    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <table_name> (
    <col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...   
    [PRIMARY KEY (<pk_col_name>[, <pk_col_name2>, ...] )]) 
    [comment <table_comment>]
    [TBLPROPERTIES ("transactional"="true" 
    [, "write.bucket.num" = "N", "acid.data.retain.hours"="hours"...])] [lifecycle <days>];

Create a new table based on an existing table

  • Replicate data from an existing table to a new one, excluding partition properties. This can be done for foreign tables or tables from external projects implementing the data lakehouse solution.

    CREATE TABLE [IF NOT EXISTS] <table_name> [LIFECYCLE <days>] AS <select_statement>;
  • Create a table with the same schema as an existing table without replicating its data. This applies to foreign tables or tables from external projects implementing the data lakehouse solution.

    CREATE TABLE [IF NOT EXISTS] <table_name> [LIFECYCLE <days>] LIKE <existing_table_name>;

Parameters

General Parameters

Parameter

Required

Description

Remarks

OR REPLACE

No

If the table specified by <table_name> already exists, you can execute the DROP TABLE statement for the table, and then create a table with the same name for replacement.

You can use this parameter instead of the following statements:

DROP TABLE IF EXISTS <tableName>;  -- If the destination table exists, drop it first
CREATE TABLE <tableName> ...;      -- Create a table
Note

Limits: You cannot use CREATE OR REPLACE TABLE together with the following syntaxes:

  • CREATE TABLE ... IF NOT EXISTS.

  • CREATE TABLE ... AS SELECT.

  • CREATE TABLE ... LIKE.

EXTERNAL

No

Specifies that the table to be created is an external table.

None

IF NOT EXISTS

No

Specifies whether a table with the same name exists in the database.

If you do not specify the IF NOT EXISTS option and a table with the same name exists, an error is returned. If you specify the IF NOT EXISTS option, a success message is returned regardless of whether a table with the same name exists. The metadata of the existing table remains unchanged.

table_name

Yes

The name of the table.

The table name is not case-sensitive and cannot contain special characters. The name can contain letters, digits, and underscores (_). The name must start with a letter and cannot exceed 128 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

PRIMARY KEY(pk)

No

The primary key of the table.

You can specify one or more columns as the primary key. This indicates that the combination of these columns must be unique in the table. You must comply with the standard SQL syntax for primary keys. The columns that are defined as the primary key must be set to not null and cannot be modified.

Important

This parameter is only used for Delta Tables.

col_name

Yes

The name of a table column.

The column name is not case-sensitive and cannot contain special characters. The name can contain letters, digits, and underscores (_). The name must start with a letter and cannot exceed 128 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

col_comment

No

The comment of a column.

The comment must be a valid string that does not exceed 1,024 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

data_type

Yes

The data type of the column.

Data types include BIGINT, DOUBLE, BOOLEAN, DATETIME, DECIMAL, and STRING. For more information, see Data Type Version Description.

NOT NULL

No

The NOT NULL attribute can be configured in the CREATE TABLE syntax to specify that the values in a specific column cannot be NULL.

For more information about how to modify the NOT NULL attribute, see Partition Operations.

default_value

No

The default value for the column.

When the insert operation does not specify this column, the default value is written to the column.

Note

The current default value does not support functions such as GETDATE() and NOW().

table_comment

No

The comment of the table.

The comment must be a valid string that does not exceed 1,024 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

LIFECYCLE

No

The lifecycle of the table.

The value must be a positive integer. Unit: days.

  • Non-partitioned tables: If no data is modified within days after the table is last modified, MaxCompute automatically reclaims the table. This operation is similar to the DROP TABLE operation.

  • Partitioned tables: MaxCompute determines whether to reclaim a partition based on the value of LastModifiedTime. Unlike non-partitioned tables, a partitioned table is not deleted even if all of its partitions have been reclaimed. You can configure lifecycles for tables, but not for partitions.

Parameters for partitioned tables

Parameter

Required

Description

Remarks

PARTITIONED BY

Yes

The partition fields of a partitioned table.

None

col_name

Yes

The name of a partition key column

The column name is not case-sensitive and cannot contain special characters. The name can contain letters, digits, and underscores (_). The name must start with a letter and cannot exceed 128 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

data_type

Yes

The data type of a partition key column

In the MaxCompute V1.0 data type edition, partition key columns must be of the STRING type. In the MaxCompute V2.0 data type edition, partition key columns can be of the TINYINT, SMALLINT, INT, BIGINT, VARCHAR, or STRING type. For more information, see Data Type Version Description. If you use a partition field to partition a table, a full table scan is not required when you add partitions, update partition data, or read partition data. This improves the efficiency of data processing.

col_comment

No

The comment of a partition key column

The comment must be a valid string that does not exceed 1,024 bytes in length. If the value of this parameter does not meet the requirements, an error is returned.

Note

Partition key column values must not contain double-byte characters, such as Chinese characters. They should start with a letter and can include letters, digits, and certain special characters, ranging from 1 to 255 bytes in length. Supported special characters include spaces, colons (:), underscores (_), dollar signs ($), number signs (#), periods (.), exclamation points (!), and at signs (@). The behavior of other characters, like escape characters \t, \n, and /, is undefined.

Clustered table parameters

Clustered tables are categorized into hash-clustered and range-clustered tables.

Hash-clustered Tables

Parameter

Required

Description

Remarks

CLUSTERED BY

Yes

The hash key.

MaxCompute performs a hash operation on the specified columns and distributes data to each bucket based on the hash values. To avoid data skew and hot spots and ensure optimal parallel execution, the CLUSTERED BY columns must have many distinct values and a small number of repeat keys. In addition, to optimize join operations, you must also consider selecting common join keys or aggregation keys, which are similar to primary keys in traditional databases.

SORTED BY

Yes

The sequence of fields in a bucket.

We recommend that you specify the same columns for SORTED BY and CLUSTERED BY to ensure optimal performance. In addition, after you specify the SORTED BY clause, MaxCompute automatically generates an index and uses the index to accelerate query execution.

number_of_buckets

Yes

The number of hash buckets.

This parameter is required and the value of this parameter varies based on the amount of data. By default, MaxCompute supports a maximum of 1,111 reducers. This means that MaxCompute supports a maximum of 1,111 hash buckets. You can use set odps.stage.reducer.num =<concurrency>; to increase this limit. However, the maximum number of hash buckets cannot exceed 4,000. Otherwise, performance is affected.

Note

For optimal performance, consider the following when specifying the number of hash buckets:

  • Aim for each hash bucket to be approximately 500 MB. For example, if the partition size is around 500 GB, we recommend setting 1,000 buckets. For larger data volumes, bucket sizes can range from 2 GB to 3 GB. To increase the maximum number of hash buckets beyond 1,111, use set odps.stage.reducer.num=<concurrency>;.

  • In scenarios where join operations are optimized, removing the shuffle and sort steps can significantly improve performance. Therefore, the number of hash buckets of a table must be a multiple of the number of hash buckets of the other table. For example, one table has 256 hash buckets and the other table has 512 hash buckets. We recommend that you specify the number of hash buckets as 2n, such as 512, 1,024, 2,048, or 4,096. This way, the system can automatically split and merge hash buckets and remove the shuffle and sort steps to improve execution efficiency.

Range-clustered Tables

Parameter

Required

Description

Remarks

RANGE CLUSTERED BY

Yes

The range-clustered columns.

MaxCompute performs the bucket operation on the specified columns and distributes data to each bucket based on the bucket ID.

SORTED BY

Yes

The sequence of fields in a bucket.

You can use this parameter in the same way as you use it for a hash-clustered table.

number_of_buckets

Yes

The number of hash buckets.

Range-clustered tables do not have the best practices of 2n for hash-clustered tables. If data is evenly distributed, you can specify any number of buckets. If you do not specify the number of buckets in a range-clustered table, MaxCompute automatically determines the optimal number based on the amount of data.

For range-clustered tables, JOIN and AGGREGATE operations can be optimized when the join or group key is the range-clustered key or its prefix. To disable shuffling and enhance efficiency, manage flags using set odps.optimizer.enable.range.partial.repartitioning=true/false;. By default, these flags are disabled.

Note
  • Clustered tables optimize aspects such as:

    • Bucket pruning.

    • Aggregation.

    • Storage.

  • Restrictions on clustered tables include the following:

    • The INSERT INTO statement is not supported; use INSERT OVERWRITE instead.

    • Tunnel commands cannot import data into range-clustered tables as they do not arrange data in order.

    • Data backup and restoration features are not supported.

Parameters for foreign tables

The following example uses parameters for creating an OSS foreign table. For guidance on creating foreign tables, see External Data Sources.

Parameter

Required

Description

STORED AS '<file_format>'

Yes

Specifies the file_format based on the data format of the foreign table.

WITH SERDEPROPERTIES(options)

No

The parameters related to the authorization, compression, and character parsing of the foreign table.

oss_location

Yes

The OSS storage location of the data in the foreign table. For more information, see Create an OSS Foreign Table.

Transaction table and delta table parameters

Parameters for Delta Tables

Delta tables support features like near real-time reads and writes, incremental reads and writes, incremental storage, and real-time updates. Only Delta tables with a primary key are supported.

Parameter

Required

Description

Remarks

PRIMARY KEY(PK)

Yes

This parameter is required when you create a Delta table. You can specify multiple columns as the primary key.

You must comply with the standard SQL syntax for primary keys. The columns that are defined as the primary key must be set to not null and cannot be modified. After you specify a primary key for a Delta table, duplicate data is removed from the table based on the primary key. The uniqueness constraint for the primary key column is valid in a single partition or in a non-partitioned table.

transactional

Yes

This parameter is required when you create a Delta table. You must set this parameter to true.

The true value indicates that the table complies with the transaction characteristics of MaxCompute atomicity, consistency, isolation, durability (ACID) tables and the Multi Version Concurrency Control (MVCC) model is used to support snapshot isolation.

write.bucket.num

No

The default value is 16. Valid values: (0, 4096].

This parameter indicates the number of buckets in a partition of a partitioned table or in a non-partition table. This parameter also specifies the number of concurrent nodes that are used to write data. You can change the value of this parameter for a partitioned table. If partitions are added to a partitioned table, the configuration of this parameter automatically takes effect on the new partitions. You cannot change the value of this parameter for a non-partitioned table. Take note of the following points:

  • If data is written by using a MaxCompute tunnel, the value of this parameter specifies the number of concurrent nodes that are used to write data. The setting of the parameter affects the import traffic and is also subject to the maximum number of concurrent nodes in the tunnel.

  • If data is written by using an SQL statement, the value of this parameter specifies the concurrency of the reducers that are used to write data. The setting is subject to the maximum number of concurrent reducer nodes.

  • We recommend that you write approximately 500 MB of data to each bucket. For example, if the partition size is about 500 GB, we recommend that you specify 1,000 buckets. This way, the size of each bucket is 500 MB on average. If a table contains a large amount of data, you can increase the size of each bucket from 500 MB to a size in the range of 2 GB to 3 GB.

acid.data.retain.hours

No

The default value is 24. Valid values: [24, 168].

The time range during which the historical data status can be queried by using the time travel feature. Unit: hours. If you need to query the historical data status for a period more than 168 hours (7 days), contact MaxCompute technical support.

  • If you set this parameter to 0, the historical data status is not retained, and time travel queries are not supported.

  • If the historical data status is retained for a period of time that is not in the range that is specified by this parameter, the data can be deleted. You can use the compact method to reclaim the space that is occupied by the data.

  • If you perform an SQL time travel query on data that is generated earlier than the time range specified by this parameter, an error is returned. For example, if the value of this parameter is 72, and the time travel query is performed to query the historical data status 72 hours ago, an error is returned.

acid.incremental.query.out.of.time.range.enabled

No

The default value is false.

If you set this parameter to true, the value of the endTimestamp property specified by an incremental query can be a point in time that is later than the maximum commit time of data in a table. If the value of the endTimestamp property is greater than the current time, new data may be inserted into a Delta table, and you may obtain different results for multiple queries. You can change the value of this parameter for a table.

acid.write.precombine.field

No

You can use this parameter to specify the name of only one column.

If you specify a column name, the system performs data deduplication based on the primary key (PK) column in the file that is committed together with this parameter. This ensures data uniqueness and consistency.

Note

If the size of data that is committed at a time exceeds 128 MB, multiple files are generated. This parameter cannot be used for multiple files.

  • General parameter requirements for Delta tables:

    • LIFECYCLE: The table's lifecycle must be at least as long as the time travel query period, meaning lifecycle >= acid.data.retain.hours / 24. When creating a table, MaxCompute verifies the specified lifecycle and returns an error if it doesn't meet the requirements.

    • Unsupported features for Delta tables include CLUSTER BY and CREATE TABLE AS statements, and they cannot be used as external tables.

  • Additional restrictions:

    • Only MaxCompute SQL can directly interact with Delta tables.

    • Existing common tables cannot be converted to Delta tables.

    • The schema of the primary key column in a Delta table cannot be altered.

Parameters for transaction tables

Parameter

Required

Description

TBLPROPERTIES(transactional"="true")

Yes

Set the table to a transactional table. You can subsequently perform update or delete operations on the transactional table to achieve row-level updates or data deletion. For more information, see update or delete data (UPDATE | DELETE).

Transactional tables have specific limitations:

  • MaxCompute only allows setting a table as transactional during its creation. Subsequent changes to make a table transactional using the ALTER TABLE statement will result in an error:

    ALTER TABLE not_txn_tbl SET tblproperties("transactional"="true");
    -- An error is returned.
    FAILED: Catalog Service Failed, ErrorCode: 151, Error Message: Set transactional is not supported
  • You cannot set clustered or external tables as transactional tables when creating them.

  • Conversion between transactional tables and MaxCompute internal, external, or clustered tables is not supported.

  • Transactional table files must be merged manually. For details, see Merge Transactional Table Files.

  • The merge partition operation is not supported.

  • Certain limitations apply when accessing transactional tables from other systems' jobs. For instance, Graph jobs cannot read from or write to transactional tables, while Spark or PAI jobs can only read from them.

  • Before updating or deleting critical data in a transactional table, or performing update, delete, or insert overwrite operations, ensure to manually back up the data to another table using select and insert operations.

Table creation parameters based on existing data tables

  • Use the CREATE TABLE [IF NOT EXISTS] <table_name> [LIFECYCLE <days>] AS <select_statement>; statement to create a new table and replicate data from an existing table.

    • Partition properties and the lifecycle of the source table are not replicated. Partition key columns of the source table become regular columns in the new table.

    • The lifecycle parameter can be used to set the table's retention period. This statement can also be used to create an internal table with data replicated from an external table.

  • Execute the CREATE TABLE [IF NOT EXISTS] <table_name> [LIFECYCLE <days>] LIKE <existing_table_name>; statement to create a table with the same schema as an existing table.

    • However, data and the lifecycle property of the source table are not replicated.

    • The lifecycle parameter can be used to set the table's retention period. This statement can also be used to create an internal table with the same schema as an existing external table.

Examples

Create a non-partitioned table

CREATE TABLE test1 (key STRING);

Create a partitioned table

Create a partitioned table named sale_detail.

CREATE TABLE IF NOT EXISTS sale_detail(
 shop_name     STRING,
 customer_id   STRING,
 total_price   DOUBLE)
PARTITIONED BY (sale_date STRING, region STRING);

Create a new table to replace the original table

  1. Create the existing table mytable and write data to it.

    CREATE OR REPLACE TABLE my_table(a bigint);
    
    INSERT INTO my_table(a) VALUES (1),(2),(3);
  2. Use OR REPLACE to create a table with the same name and modify its fields.

    CREATE OR REPLACE TABLE my_table(b string);
  3. Query the my_table table to see the changes.

    +------------+
    | b          | 
    +------------+
    +------------+

The following SQL statements are invalid:

CREATE OR REPLACE TABLE IF NOT EXISTS my_table(b STRING);
CREATE OR REPLACE TABLE my_table AS SELECT;
CREATE OR REPLACE TABLE my_table LIKE newtable;

Create a new table and replicate data from an existing table to the new table

Create a table, replicate data from an existing table to the new table, and then configure the lifecycle for the new table.

-- Create a new table sale_detail_ctas1, replicate data from sale_detail to sale_detail_ctas1, and configure the lifecycle for the new table.
SET odps.sql.allow.fullscan=true;
CREATE TABLE sale_detail_ctas1 LIFECYCLE 10 AS SELECT * FROM sale_detail;

You can retrieve detailed information about the schema and lifecycle of the table by executing the DESC EXTENDED sale_detail_ctas1; command.

Note

The sale_detail is a partitioned table. However, the table created by the CREATE TABLE ... AS select_statement ... statement, sale_detail_ctas1, does not replicate partition properties. It treats the partition key column of the source table as a regular column in the target table. Thus, sale_detail_ctas1 is a non-partitioned table with five columns.

Create a new table and use constants in the SELECT statement as column values

Note

When using constants as column values in the SELECT clause, we recommend to specify the column names. The fourth and fifth columns of the created table sale_detail_ctas3 are named _c4 and _c5.

  • Specify column names.

    SET odps.sql.allow.fullscan=true;
    CREATE TABLE sale_detail_ctas2
    AS
    SELECT shop_name, customer_id, total_price, '2013' AS sale_date, 'China' AS region
    FROM sale_detail;
  • Do not specify column names.

    SET odps.sql.allow.fullscan=true;
    
    CREATE TABLE sale_detail_ctas3
    AS
    SELECT shop_name, customer_id, total_price, '2013', 'China' 
    FROM sale_detail;

Create a new table with the same schema as an existing table

Create a table with the same schema as an existing table and configure the lifecycle for the new table.

CREATE TABLE sale_detail_like LIKE sale_detail LIFECYCLE 10;

You can retrieve detailed information about the schema and lifecycle of the table by running the DESC EXTENDED sale_detail_like; command.

Note

The schema of the sale_detail_like table is identical to that of the sale_detail table, including properties such as column names, column comments, and table comments, except for the lifecycle. However, data from the sale_detail table is not replicated to the sale_detail_like table.

Create a new table with the same schema as an external table

-- Create a new table mc_oss_extable_orc_like that uses the same schema as the mc_oss_extable_orc foreign table.
CREATE TABLE mc_oss_extable_orc_like LIKE mc_oss_extable_orc;

You can retrieve detailed information about the schema of the table by executing the DESC mc_oss_extable_orc_like; command.

+------------------------------------------------------------------------------------+
| Owner: ALIYUN$****@***.aliyunid.com | Project: max_compute_7u************yoq              |
| TableComment:                                                                      |
+------------------------------------------------------------------------------------+
| CreateTime:               2022-08-11 11:10:47                                      |
| LastDDLTime:              2022-08-11 11:10:47                                      |
| LastModifiedTime:         2022-08-11 11:10:47                                      |
+------------------------------------------------------------------------------------+
| InternalTable: YES      | Size: 0                                                  |
+------------------------------------------------------------------------------------+
| Native Columns:                                                                    |
+------------------------------------------------------------------------------------+
| Field           | Type       | Label | Comment                                     |
+------------------------------------------------------------------------------------+
| id              | string     |       |                                             |
| name            | string     |       |                                             |
+------------------------------------------------------------------------------------+

Create a table with a new data type

SET odps.sql.type.system.odps2=true;
CREATE TABLE test_newtype (
    c1 TINYINT,
    c2 SMALLINT,
    c3 INT,
    c4 BIGINT,
    c5 FLOAT,
    c6 DOUBLE,
    c7 DECIMAL,
    c8 BINARY,
    c9 TIMESTAMP,
    c10 ARRAY<MAP<BIGINT,BIGINT>>,
    c11 MAP<STRING,ARRAY<BIGINT>>,
    c12 STRUCT<s1:STRING,s2:BIGINT>,
    c13 VARCHAR(20))
LIFECYCLE 1;

Create a hash clustered table

  • Non-partitioned table

    CREATE TABLE t1 (a STRING, b STRING, c BIGINT) CLUSTERED BY (c) SORTED BY (c) INTO 1024 buckets;
  • Partitioned table

    CREATE TABLE t2 (a STRING, b STRING, c BIGINT) 
    PARTITIONED BY (dt STRING) CLUSTERED BY (c) SORTED BY (c) into 1024 buckets;

Create a range clustered table

  • Non-partitioned table

    CREATE TABLE t3 (a STRING, b STRING, c BIGINT) RANGE CLUSTERED BY (c) SORTED BY (c) INTO 1024 buckets;
  • Partitioned table

    CREATE TABLE t4 (a STRING, b STRING, c BIGINT) 
    PARTITIONED BY (dt STRING) RANGE CLUSTERED BY (c) SORTED BY (c); 

Create a transactional table

  • Non-partitioned table

    CREATE TABLE t5(id bigint) tblproperties("transactional"="true");
  • Partitioned table

    CREATE TABLE IF NOT EXISTS t6(id bigint) 
    PARTITIONED BY (ds string) tblproperties ("transactional"="true");

Create a Delta table

  • Create a Delta table

    CREATE TABLE mf_tt (pk bigint NOT NULL PRIMARY KEY, val bigint) 
      tblproperties ("transactional"="true");
  • Create a Delta table and configure the main table properties.

    CREATE TABLE mf_tt2 ( 
      pk bigint NOT NULL, 
      pk2 bigint NOT NULL, 
      val bigint, 
      val2 bigint, 
      PRIMARY KEY (pk, pk2)
    ) 
    tblproperties (
      "transactional"="true", 
      "write.bucket.num" = "64", 
      "acid.data.retain.hours"="120"
    ) lifecycle 7;

Create a non-partitioned table and specify default values for fields

CREATE TABLE test_default( 
tinyint_name tinyint NOT NULL default 1Y,
smallint_name SMALLINT NOT NULL DEFAULT 1S,
int_name INT NOT NULL DEFAULT 1,
bigint_name BIGINT NOT NULL DEFAULT 1,
binary_name BINARY ,
float_name FLOAT ,
double_name DOUBLE NOT NULL DEFAULT 0.1,
decimal_name DECIMAL(2, 1) NOT NULL DEFAULT 0.0BD,
varchar_name VARCHAR(10) ,
char_name CHAR(2) ,
string_name STRING NOT NULL DEFAULT 'N',
boolean_name BOOLEAN NOT NULL DEFAULT TRUE
);

Create an internal table and replicate data from an external partitioned table

  • The internal table does not contain partition properties.

    1. Create an OSS external table and a MaxCompute internal table.

      -- Create an OSS foreign table and insert data into the table.
      CREATE EXTERNAL table max_oss_test(a int, b int, c int) 
      stored AS TEXTFILE
      location "oss://oss-cn-hangzhou-internal.aliyuncs.com/<bucket_name>";
      
      INSERT INTO max_oss_test VALUES 
      (101, 1, 20241108),
      (102, 2, 20241109),
      (103, 3, 20241110);
      
      SELECT * FROM max_oss_test;
      
      -- Result
      a    b    c
      101    1    20241108
      102    2    20241109
      103    3    20241110
      
      
      -- Execute the CREATE TABLE AS statement to create an internal table
      CREATE TABLE from_exetbl_oss AS SELECT * FROM max_oss_test;
      
      -- Query data of the internal table
      SELECT * FROM from_exetbl_oss;
      
      -- All data in the internal table is returned
      a    b    c
      101    1    20241108
      102    2    20241109
      103    3    20241110
    2. Execute the DESC from_exetbl_as_par; command to retrieve the schema of the internal table.

      +------------------------------------------------------------------------------------+
      | Owner:                    ALIYUN$***********                                       |
      | Project:                  ***_*****_***                                            |
      | TableComment:                                                                      |
      +------------------------------------------------------------------------------------+
      | CreateTime:               2023-01-10 15:16:33                                      |
      | LastDDLTime:              2023-01-10 15:16:33                                      |
      | LastModifiedTime:         2023-01-10 15:16:33                                      |
      +------------------------------------------------------------------------------------+
      | InternalTable: YES      | Size: 919                                                |
      +------------------------------------------------------------------------------------+
      | Native Columns:                                                                    |
      +------------------------------------------------------------------------------------+
      | Field           | Type       | Label | Comment                                     |
      +------------------------------------------------------------------------------------+
      | a               | string     |       |                                             |
      | b               | string     |       |                                             |
      | c               | string     |       |                                             |
      +------------------------------------------------------------------------------------+
  • The internal table contains partition properties.

    1. Create an internal table from_exetbl_like.

      -- Query the external table of the data lakehouse solution from the MaxCompute side
      SELECT * FROM max_oss_test;
      -- Result
      a    b    c
      101    1    20241108
      102    2    20241109
      103    3    20241110
      
      -- Execute the CREATE TABLE LIKE statement to create an internal table
      CREATE TABLE from_exetbl_like LIKE max_oss_test;
      
      -- Query data of the internal table
      SELECT * FROM from_exetbl_like;
      -- Only the schema of the internal table is returned
      a    b    c
    2. Execute the DESC from_exetbl_like; command to retrieve the schema of the internal table.

      +------------------------------------------------------------------------------------+
      | Owner:                    ALIYUN$************                                      |
      | Project:                  ***_*****_***                                            |
      | TableComment:                                                                      |
      +------------------------------------------------------------------------------------+
      | CreateTime:               2023-01-10 15:09:47                                      |
      | LastDDLTime:              2023-01-10 15:09:47                                      |
      | LastModifiedTime:         2023-01-10 15:09:47                                      |
      +------------------------------------------------------------------------------------+
      | InternalTable: YES      | Size: 0                                                  |
      +------------------------------------------------------------------------------------+
      | Native Columns:                                                                    |
      +------------------------------------------------------------------------------------+
      | Field           | Type       | Label | Comment                                     |
      +------------------------------------------------------------------------------------+
      | a               | string     |       |                                             |
      | b               | string     |       |                                             |
      +------------------------------------------------------------------------------------+
      | Partition Columns:                                                                 |
      +------------------------------------------------------------------------------------+
      | c               | string     |                                                     |
      +------------------------------------------------------------------------------------+

Related commands

  • ALTER TABLE: Modify table operations.

  • TRUNCATE: Clear data from the specified table.

  • DROP TABLE: Delete a partitioned or non-partitioned table.

  • DESC TABLE/VIEW: Retrieve information about MaxCompute internal tables, views, materialized views, external tables, clustered tables, and transactional tables.

  • SHOW: Retrieve SQL DDL statements of a table, list all tables and views in a project, or list all partitions in a table.