This topic describes how to execute a CREATE TABLE statement to create a table and how to execute a CREATE VIEW statement to create a view in ApsaraDB for ClickHouse.

Create a local table

The following sample code shows the syntax of a CREATE TABLE statement that is used to create a local table:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER cluster
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...];
Parameters and notes:
  • db: the name of the database in which you want to create a local table. If you do not specify this parameter, the local table is created in the current database.
  • cluster: the name of the cluster. Set this parameter to default. The ON CLUSTER in the statement indicates that a local table will be created on each node of the cluster.
  • type: the data type of the field, such as UINT32.
  • DEFAULT: the default value of the field. If an INSERT INTO statement does not insert data into this field, the system calculates the default values of this field based on the DEFAULT expression and then inserts the default values in this field.
  • MATERIALIZED: specifies that the field is a materialized field. Values of this field can only be calculated rather than be inserted. You do not need to specify this field in INSERT INTO statements. When you execute a SELECT * statement, the query results do not contain this field.
  • ALIAS: the alias of the field. Field aliases are not stored in tables. You cannot insert data into a field by using the alias of the field in an INSERT INTO statement. When you execute a SELECT * statement, the field aliases are not displayed in the query results. However, you can use field aliases in a SELECT statement. When the statement is executed, the aliases are replaced with the names of the fields.
  • Take note of the difference between a materialized field and an alias field. A materialized field stores data and does not need to be calculated in a SELECT statement. An alias field does not store data and needs to be calculated in a SELECT statement by using an expression. The values of the alias field are then returned in query results.
The following parameters are related to table engines. The parameters are supported only by engines in the MergeTree family.
  • PARTITION BY: the partition key. Data is partitioned usually by date. You can also specify another field or field expression as the partition key.
  • ORDER BY: the sort key. The sort key can be a tuple of fields or an expression.
  • PRIMARY KEY: the primary key. By default, the primary key is the same as the sort key. Therefore, in most cases, you do not need to specify the primary key.
  • SAMPLE BY: the sampling expression. The sampling expression must be included in the primary key.
  • SETTINGS: the additional parameters that affect performance.
  • GRANULARITY: the index granularity.

The following sample code provides an example on how to create a local table:

CREATE TABLE ontime_local ON CLUSTER default
(
    Year UInt16,
    Quarter UInt8,
    Month UInt8,
    DayofMonth UInt8,
    DayOfWeek UInt8,
    FlightDate Date,
    FlightNum String,
    Div5WheelsOff String,
    Div5TailNum String
)ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{database}/{table}/{shard}',
    '{replica}')
 PARTITION BY toYYYYMM(FlightDate)
 PRIMARY KEY (intHash32(FlightDate))
 ORDER BY (intHash32(FlightDate),FlightNum)
 SAMPLE BY intHash32(FlightDate)
SETTINGS index_granularity= 8192 ;
Note :
If your ApsaraDB for ClickHouse cluster is of the High-availability Edition, you must use an engine in the Replicated or MergeTree family. Otherwise, data is not synchronized between the replicas and query results are inconsistent. When replicatedmergetree is created, there are only two ways to fill in parameters.
  • ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}'),the character in the parameter is not allowed to be modified.
  • ReplicatedMergeTree(),parameters will be automatically filled in the same way as method 1 by default.

Create a distributed table

Create a distributed table based on a local table.

The following sample code shows the syntax of a CREATE TABLE statement that is used to create a distributed table:

CREATE TABLE  [db.]table_name  ON CLUSTER default
 AS db.local_table_name   ENGINE = Distributed(<cluster>, <database>, <shard table> [, sharding_key])
Parameters:
  • db: the name of the database in which you want to create a distributed table.
  • local_table_name: the local table based on which you want to create a distributed table.
  • shard table: the local table based on which you want to create a distributed table.
  • sharding_key: the sharding key. The sharding key can be a field. For example, if the sharding key is a field named user_id, which is of the INTEGER type, the distributed table to be created is sharded based on the remainders of the field values. The sharding key can also be an expression. For example, if the sharding key is rand(), the distributed table to be created is sharded based on the return value of the rand() function divided by the total weight of shards. To achieve more even sharding, you can use a hash function. For example, you can specify the sharding key as intHash64(user_id).

The following sample code provides an example on how to create a distributed table:

-- Create a distributed table.
CREATE TABLE ontime_distributed ON CLUSTER default
 AS db_name.ontime_local 
ENGINE = Distributed(default, db_name, ontime_local, rand());
Note : The db_name parameter is required. It represents the name of the database in which you want to create a distributed table.

Create a table with the same schema as an existing table

The following sample code shows the syntax of a CREATE TABLE statement that is used to create a table with the same schema as an existing table:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER default AS [db.]name2 [ENGINE = engine];

You can use ENGINE=engine to specify a table engine. By default, the table engine of the table to be created is the same as that of the existing table. In this example, the name of the existing table is name2.

The following sample code provides examples on how to manage resource queues:

create table t2 ON CLUSTER default as db1.t1;

Use the SELECT clause in a CREATE TABLE statement

You can use the SELECT clause in a CREATE TABLE statement. The table to be created has the same schema as the query results of the SELECT clause. The query results are inserted into the table to be created. The following sample code shows the syntax:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER default ENGINE = engine AS SELECT ...

The ENGINE parameter is required.

The following sample code provides examples on how to manage resource queues:

create table t2 ON CLUSTER default ENGINE =MergeTree() as select * from db1.t1 where id<100;

Create a temporary table

The following sample code shows the syntax of a CREATE TEMPORARY TABLE statement:

CREATE TEMPORARY TABLE [IF NOT EXISTS] table_name ON CLUSTER default 
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
)

The keyword TEMPORARY indicates that the table to be created is a temporary table. In most cases, temporary tables are not manually created. They are created only when a query of a distributed table contains (GLOBAL) IN and a storage medium for external data is required.

Create a view

The following sample code shows the syntax of a CREATE VIEW statement:

CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] ON CLUSTER default [ENGINE = engine] [POPULATE] AS SELECT ...

The keyword MATERIALIZED indicates that the view to be created is a materialized view. If this keyword is not contained in the statement, the view to be created is a regular view.

Assume that you execute the following statement to create a view:

CREATE VIEW view_1 ON CLUSTER default AS SELECT a,b,c,d FROM db1.t1;

The query results of the following statements are the same:

SELECT a, b, c FROM view_1 ;
SELECT a, b, c FROM (SELECT a,b,c FROM db1.t1);

Data in a materialized view is the query results of a SELECT statement that was used to create the materialized view.

When you create a materialized view, you must specify a table engine. The specified table engine is used to store data.

Here is how a materialized view works. A materialized view is created based on the query results of a SELECT statement. When data is written to the queried table, the query results of the SELECT statement are updated. The updated query results are synchronized to the materialized view.

If you use the POPULATE clause when you create a materialized view, the data of the specified table is inserted into the materialized view during the creation process. The effect is similar to the effect of a CREATE TABLE... AS SELECT ...statement. If you do not use the POPULATE clause, the materialized view contains only the data that is written to the specified table after the materialized view is created.

When you create a view, we recommend that you do not use the POPULATE clause because the data that is written to the specified table during the creation process is not inserted into the view. The SELECT clause that you use to create a view may contain DISTINCT, GROUP BY, ORDER BY, or LIMIT. These operations are performed only on the involved data blocks when data is inserted into the view. For example, if the SELECT clause contains GROUP BY, data is aggregated only when the query is run and only the data that is queried and written to the view is aggregated. The data is not aggregated in other cases. An exception is that you use a data aggregation engine such as SummingMergeTree.

You cannot run ALTER TABLE on materialized views. Therefore, you may find it inconvenient to modify data in a materialized view. Assume that you create a materialized view by executing a CREATE VIEW statement that contains TO[db.]name. To modify data in this materialized view, you can execute a DETACH statement to detach the materialized view from the database. Then, run ALTER TABLE on the table that you want to modify. Finally, execute an ATTACH statement to re-attach the materialized view to the database. The modified data in the table is synchronized to the materialized view. You can query views the same way that you query regular tables. For example, you can execute a SHOW TABLES statement to query views.

ClickHouse does not provide a statement type especially for deleting views. To delete a view, execute a DROP TABLE statement.