This topic describes how to run statements to create, view, delete, rename, and modify tables on the MaxCompute client.

Create a table

Use the following syntax to create a table:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [DEFAULT value] [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])] INTO number_of_buckets BUCKETS] -- Used to set the shuffle and sort attributes when you create a hash cluster table.
[STORED BY StorageHandler] -- Used only for external tables.
[WITH SERDEPROPERTIES (Options)] -- Used only for external tables.
[LOCATION OSSLocation]; -- Used only for external tables.
[LIFECYCLE days]
[AS select_statement]
 CREATE TABLE [IF NOT EXISTS] table_name
 LIKE existing_table_name
  • If you do not specify the IF NOT EXISTS option and a table with the same name already exists, an error is returned while creating a table. If you specify this option, a success message is returned, regardless of whether a table with the same name already exists. This is true even if the schema of the existing table is different from that of the table you want to create. In addition, the metadata of the existing table does not change after the new table is created.
  • Both the table name and column name are case-insensitive and cannot contain special characters. A table name or column name can contain only lowercase and uppercase letters, digits, and underscores (_). We recommend that you start the name with a letter. The name can be up to 128 bytes in length.
  • You can define a maximum of 1,200 columns in a single table.
  • MaxCompute 2.0 supports many data types, including bigint, double, Boolean, datetime, decimal, and string.
    Note If you need to use a new data type such as tinyint, smallint, int, float, varchar, timestamp, or binary in a MaxCompute SQL statement, run the SET statement to enable new data types as follows:
    • Session-level: To use a new data type, add a SET statement set odps.sql.type.system.odps2=true; before the SQL statement and submit the SET statement together with the SQL statement.
    • Project-level: MaxCompute allows you to enable new data types for a project. The project owner can run the following command to enable new data types at the project level:
      setproject odps.sql.type.system.odps2=true;
      For more information about the setproject command, see Other operations. For more information about the notes for enabling new data types for a project, see Data types.
  • Use DEFAULT value to specify the default value of a column. If the INSERT statement does not specify the value of the column, the default value is used for the column.
  • Use PARTITIONED BY to specify the column on which a table is partitioned. For more information, see Partition. Currently, MaxCompute 2.0 supports the following data types for partition columns: tinyint, smallint, int, bigint, varchar, and string.

    The partition column name cannot contain double-byte characters. It must start with a letter, either in lowercase or uppercase, followed by letters or digits. The name can be up to 128 bytes in length. You can use special characters, including spaces, colons (:), underscores (_), dollar signs ($), number signs (#), periods (.), exclamation points (!), and at signs (@). Other characters, such as the tab (\t), new line (\n), and forward slashes (/), are considered as undefined characters. If you use partition columns to partition a table, a full table scan is not required when you add partitions, update partition data, or read partition data. This improves table processing efficiency.

  • A single table can contain up to 60,000 partitions that are organized in a hierarchy of up to six levels.
  • A comment is a valid string that can be up to 1,024 bytes in length.
  • Use LIFECYCLE to specify the lifecycle of a table in units of days. The CREATE TABLE LIKE statement does not duplicate the lifecycle attribute from the source table.
  • Use CLUSTERED BY to specify hash keys. MaxCompute performs the hash operation on the specified columns and distributes data to each bucket based on the hash values.
    • To prevent data skew and hot spots and to better run concurrent statements, we recommend that you specify columns with a large value range and a small number of duplicate key values in CLUSTERED BY. Also, to optimize the JOIN operation, we recommend that you select commonly used join or aggregation keys. The join and aggregation keys are similar to primary keys in conventional databases.
    • Use SORTED BY to specify how to sort fields in a bucket. We recommend that you specify columns in SORTED BY the same as those in CLUSTERED BY for better performance. After you specify columns in the SORTED BY clause, MaxCompute automatically generates indexes. Based on these indexes, MaxCompute can run SQL statements faster when you query data.
    • Use INTO number_of_buckets BUCKETS to specify the number of hash buckets. The number of buckets is required and varies with the data size. By default, MaxCompute supports up to 1,111 reducers, and therefore, up to 1,111 hash buckets. You can run the set odps.sql.reducer.instances=xxx; command to increase the maximum number of hash buckets. However, the maximum number cannot exceed 4,000. Otherwise, performance may be affected.

      We recommend that you abide by the following principles when you specify the number of hash buckets:

      • Keep the size of each bucket around 500 MB. For example, if you want to add 1,000 buckets to a partition whose size is 500 GB, the size of each bucket is 500 MB on average. If a table contains a large amount of data, you can increase the size of each bucket from 500 MB to a size in the range of 2 GB to 3 GB. You can also run the set odps.sql.reducer.instances=xxx command to increase the maximum number of hash buckets.
      • If two tables are frequently joined, specify the same number of hash buckets for them to optimize the JOIN operation and to skip the shuffle and sort operations. If the number of hash buckets for one table is different from that for the other table, we recommend that you use the greater number for both tables. This guarantees that SQL statements can be run concurrently in an efficient manner.
    • Hash cluster tables help optimize the following aspects:
      • Bucket pruning
      • Aggregation
      • Storage
    • Hash cluster tables are subject to the following limits:
      • The INSERT INTO statement is not supported. You can only run the INSERT OVERWRITE statement to add data.
      • You cannot run the tunnel upload command to import data to a range cluster table. The data uploaded by using this command is not arranged in order.
  • For more information about external tables, see Access OSS unstructured data.
For example, to create a partitioned table named sale_detail to store sales records and use sale_date and region as partition columns, you can use the following statement:
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string,region string);
-- This statement creates a partitioned table named sale_detail.
Use the following create table...as select... statement to create a table and duplicate data from the sale_detail table to the new table:
create table sale_detail_ctas1 as
select * from sale_detail;
If the sale_detail table has data, all data in the sale_detail table is duplicated to the sale_detail_ctas1 table after you run the preceding statement.
Note The sale_detail table is a partitioned table. However, the table created by the create table...as select... statement does not inherit the partition attribute from the source table. The partition columns of the source table become general columns of the destination table. That is, the sale_detail_ctas1 table is a non-partitioned table that contains five columns.
In the create table...as select... statement, if you use constants as column values in the SELECT clause, we recommend that you set column aliases as follows:
CREATE TABLE sale_detail_ctas2
AS
SELECT shop_name, customer_id, total_price, '2013' AS sale_date, 'China' AS region
FROM sale_detail;
If you do not set column aliases, the statement is as follows:
CREATE TABLE sale_detail_ctas3
AS
SELECT shop_name, customer_id, total_price, '2013', 'China'
FROM sale_detail;

In this case, the names of the fourth column and fifth column become names like _c5 and _c6 in the created sale_detail_ctas3 table.

To allow the destination table to have the same schema as the source table, you can use the following create table...like statement:
create table sale_detail_like like sale_detail;

In this case, the schema of the sale_detail_like table is the same as that of the sale_detail table. Except for the lifecycle, both tables have the same column names, column comments, and table comment. However, data in the sale_detail table is not duplicated to the sale_detail_like table.

To create a hash cluster table, you can use the following statements:

CREATE TABLE T1 (a string, b string, c bigint) CLUSTERED BY (c) SORTED by (c) INTO 1024 BUCKETS; -- This statement creates a non-partitioned hash cluster table.
CREATE TABLE T1 (a string, b string, c bigint) PARTITIONED BY (dt string) CLUSTERED BY (c) SORTED by (c) INTO 1024 BUCKETS; -- This statement creates a partitioned hash cluster table.

View table information

Use the following syntax to view table information:
desc <table_name>;
desc extended <table_name>; -- View the information about an external table.
Examples
  • If you want to view the information about the preceding sale_detail table, run the following statement:
    desc sale_detail;
    The returned result is as follows:
    odps@ $odps_project>desc sale_detail;
    +--------------------------------------------------------------------+
    | Owner: ALIYUN$maojing.mj@alibaba-inc.com | Project: $odps_project
                      |
    | TableComment:
       |
    +--------------------------------------------------------------------+
    | CreateTime:               2017-06-28 15:05:17
       |
    | LastDDLTime:              2017-06-28 15:05:17
       |
    | LastModifiedTime:         2017-06-28 15:05:17
       |
    +--------------------------------------------------------------------+
    | InternalTable: YES      | Size: 0
       |
    +--------------------------------------------------------------------+
    | Native Columns:
       |
    +--------------------------------------------------------------------+
    | Field           | Type       | Label | Comment
       |
    +--------------------------------------------------------------------+
    | shop_name       | string     |       |
       |
    | customer_id     | string     |       |
       |
    | total_price     | double     |       |
       |
    +--------------------------------------------------------------------+
    | Partition Columns:
       |
    +--------------------------------------------------------------------+
    | sale_date       | string     |
       |
    | region          | string     |
       |
    +--------------------------------------------------------------------+
    OK
  • If you want to view the information about the preceding sale_detail_like table, run the following statement:
    desc sale_detail_like
    The returned result is as follows:
    odps@ $odps_project>desc sale_detail_like;
    +--------------------------------------------------------------------+
    | Owner: ALIYUN$maojing.mj@alibaba-inc.com | Project: $odps_project
                      |
    | TableComment:
       |
    +--------------------------------------------------------------------+
    | CreateTime:               2017-06-28 15:42:17
       |
    | LastDDLTime:              2017-06-28 15:42:17
       |
    | LastModifiedTime:         2017-06-28 15:42:17
       |
    +--------------------------------------------------------------------+
    | InternalTable: YES      | Size: 0
       |
    +--------------------------------------------------------------------+
    | Native Columns:
       |
    +--------------------------------------------------------------------+
    | Field           | Type       | Label | Comment
       |
    +--------------------------------------------------------------------+
    | shop_name       | string     |       |
       |
    | customer_id     | string     |       |
       |
    | total_price     | double     |       |
       |
    +--------------------------------------------------------------------+
    | Partition Columns:
       |
    +--------------------------------------------------------------------+
    | sale_date       | string     |
       |
    | region          | string     |
       |
    +--------------------------------------------------------------------+
    OK
In the preceding results, you can find that except for the lifecycle, the attributes, such as field types and partition types, of the sale_detail_like table are the same as those of the sale_detail table. For more information, see Table operations.
Note The data size you obtain by running the desc table_name statement includes the size of data in the recycle bin. If you want to clear the recycle bin, run the purge table table_name statement. Then, run the desc table_name statement to view the size of data excluding that in the recycle bin. You can also run the show recyclebin statement to view the details about data in the recycle bin for the current project.

If you view the information about the sale_detail_ctas1 table, you can find that sale_date and region are only general columns and not partitions of the table.

  • As more and more data types are supported by MaxCompute, the types of data that the DESC statement can return also increase. For more information, see Data types. If you use new data types in MaxCompute, you need to enable the new data types when you run SQL statements. However, you do not need to do so when you run the DESC statement.
    Note If other SQL statements depend on the output of the desc table_name statement, we recommend that you enable new data types in MaxCompute in a timely manner.
    Examples
    set odps.sql.type.system.odps2=true;
    CREATE TABLE test_newtype (
        c1 tinyint
        ,c2 smallint
        ,c3 int
        ,c4 BIGINT
        ,c5 float
        ,c6 DOUBLE
        ,c7 decimal
        ,c8 binary
        ,c9 timestamp
        ,c10 ARRAY<map<BIGINT,BIGINT>>
        ,c11 map<STRING,ARRAY<BIGINT>>
        ,c12 STRUCT<s1:STRING,s2:BIGINT>
        ,c13 varchar(20))
    LIFECYCLE 1
    ;
    The result returned by the desc test_newtype; statement is as follows, with some information being omitted:
    | Native Columns:                                                                    |
    +------------------------------------------------------------------------------------+
    | Field           | Type       | Label | Comment                                     |
    +------------------------------------------------------------------------------------+
    | c1              | tinyint    |       |                                             |
    | c2              | smallint   |       |                                             |
    | c3              | int        |       |                                             |
    | c4              | bigint     |       |                                             |
    | c5              | float      |       |                                             |
    | c6              | double     |       |                                             |
    | c7              | decimal    |       |                                             |
    | c8              | binary     |       |                                             |
    | c9              | timestamp  |       |                                             |
    | c10             | array<map<bigint,bigint>> |       |                              |
    | c11             | map<string,array<bigint>> |       |                              |
    | c12             | struct<s1:string,s2:bigint> |       |                            |
    | c13             | varchar(20) |       |                                            |
    +------------------------------------------------------------------------------------+
You can run the desc extended table_name; statement to view the Hash Clustering attribute of a hash cluster table. In the following result, the Hash Clustering attribute is displayed in Extended Info.

+------------------------------------------------------------------------------------+
| Owner: ALIYUN$xxxxxxx@aliyun.com | Project: xxxxx |
| TableComment: |
+------------------------------------------------------------------------------------+
| CreateTime: 2017-12-25 11:18:26 |
| LastDDLTime: 2017-12-25 11:18:26 |
| LastModifiedTime: 2017-12-25 11:18:26 |
| Lifecycle: 2 |
+------------------------------------------------------------------------------------+
| InternalTable: YES | Size: 0 |
+------------------------------------------------------------------------------------+
| Native Columns: |
+------------------------------------------------------------------------------------+
| Field | Type | Label | Comment |
+------------------------------------------------------------------------------------+
| a | string | | |
| b | string | | |
| c | bigint | | |
+------------------------------------------------------------------------------------+
| Partition Columns: |
+------------------------------------------------------------------------------------+
| dt | string | |
+------------------------------------------------------------------------------------+
| Extended Info: |
+------------------------------------------------------------------------------------+
| TableID: 91a3395d3ef64b4d9ee1d28527552864 |
| IsArchived: false |
| PhysicalSize: 0 |
| FileNum: 0 |
| ClusterType: hash |
| BucketNum: 1024 |
| ClusterColumns: [c] |
| SortColumns: [c ASC] |
+------------------------------------------------------------------------------------+
For a partitioned table with the Hash Clustering attribute added, in addition to running the preceding statement to view the table attributes, you can also run the following statement to view the partition attributes.
DESC EXTENDED table_name partition(pt_spec);

Run the SELECT statement to view the details of the table. For more information, see SELECT syntax.

View the CREATE TABLE statement

Use the following syntax to view the CREATE TABLE statement:
SHOW CREATE TABLE <table_name>;
Note You can run this statement to generate DDL statements used for creating a table and use the statements to rebuild the table schema through SQL.

Delete a table

Use the following syntax to delete a table:
DROP TABLE [IF EXISTS] table_name;
Note
  • If you do not specify the IF EXISTS option and the target table does not exist, an error is returned. If you specify this option, a success message is returned regardless of whether the table exists.
  • When you delete an external table, data in Object Storage Service (OSS) is not deleted.
Examples
create table sale_detail_drop like sale_detail;
    drop table sale_detail_drop;
    -- If the table exists, a success message is returned. If the table does not exist, an error is returned.
    drop table if exists sale_detail_drop2;
    -- A success message is returned regardless of whether the sale_detail_drop2 table exists.

Rename a table

Use the following syntax to rename a table:
ALTER TABLE table_name RENAME TO new_table_name;
Note
  • The RENAME operation only changes the table name instead of modifying the table data.
  • If the table specified by new_table_name already exists, an error is returned.
  • If the table specified by table_name does not exist, an error is returned.
Examples
create table sale_detail_rename1 like sale_detail;
alter table sale_detail_rename1 rename to sale_detail_rename2;

Change the owner of a table

MaxCompute SQL allows you to run the CHANGEOWNER statement to change the owner of a table. The syntax is as follows:
alter table table_name changeowner to 'ALIYUN$xxx@aliyun.com';

Modify the comment of a table

Use the following syntax to modify the comment of a table:
ALTER TABLE table_name SET COMMENT 'tbl comment';
Note
  • The table specified by table_name must exist.
  • A comment can be up to 1,024 bytes in length.
Examples
alter table sale_detail set comment 'new comments for table sale_detail';

You can run the DESC statement in MaxCompute to view the modified comment of the table. For more information, see Describe Table in Table operations.

Modify the last update time of a table

MaxCompute SQL allows you to run the TOUCH statement to modify the last update time of a table. This statement modifies the LastDataModifiedTime setting of a table to the current time.

Use the following syntax to modify the last update time of a table:
ALTER TABLE table_name TOUCH;
Note
  • If the table specified by table_name does not exist, an error is returned.
  • After this statement modifies the LastDataModifiedTime setting of the table, MaxCompute considers that the table data has changed, and restarts the lifecycle of the table from the last update time.

Modify the Hash Clustering attribute of a table

You can run the ALTER TABLE statement to add the Hash Clustering attribute to a partitioned table or remove the Hash Clustering attribute from a partitioned table.

Use the following syntax to add the Hash Clustering attribute:

ALTER TABLE table_name     
[CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])] INTO number_of_buckets BUCKETS]
Use the following syntax to remove the Hash Clustering attribute:
ALTER TABLE table_name NOT CLUSTERED;
Note
  • The ALTER TABLE statement can only modify the Hash Clustering attribute for a partitioned table instead of a non-partitioned table. The Hash Clustering attribute cannot be modified after it is added for a non-partitioned table.
  • The ALTER TABLE statement takes effect only for the new partitions of a table. Therefore, you cannot run the partition alter table statement for the existing partitions. After the Hash Clustering attribute is added, data is stored in the new partitions in compliance with hash clustering.

Clear data from a non-partitioned table

You can use the TRUNCATE TABLE statement to clear data from a specified non-partitioned table instead of a partitioned table. For a partitioned table, you can run the ALTER TABLE table_name DROP PARTITION statement to clear data from partitions.

Use the following syntax to clear data from a non-partitioned table:
TRUNCATE TABLE table_name;