This topic describes how to migrate data from a self-managed ClickHouse database to Hologres for data development and analytics.
Prerequisites
- A Hologres instance is purchased. For more information, see Purchase a Hologres instance.
- A ClickHouse cluster is available and has the ClickHouse client installed. To install the client, click ClickHouse-Client. For more information about how to install and use the client, see Getting Started.
- The PostgreSQL client is used to connect to the Hologres instance. For more information, see Connect to a Hologres instance from the PostgreSQL client.
Background information
ClickHouse is a column-oriented database management system used for online analytical processing (OLAP). Hologres is an interactive analytics service developed by Alibaba Cloud. Hologres can respond to queries within subseconds and supports a large amount of queries per second (QPS). You can migrate tables and data from a ClickHouse cluster to Hologres to enhance data development and analytics.
Category | Feature | ClickHouse | Hologres |
---|---|---|---|
Service | Orientation | OLAP-based service | Universal real-time data warehouse for data analytics and online data processing |
Data write | Storage mode | Column-oriented | Column-oriented and row-oriented |
Visibility | Data can be viewed within seconds after it is written. The ClickHouse client can accumulate data for batch processing. Data is written to a distributed table based on data replication in shards. | Data can be viewed within milliseconds after it is written. Data is written in a way that adapts to batch processing. | |
Performance | Good | Excellent | |
Detailed records | Supported | Supported | |
Primary key | ClickHouse does not require a unique primary key. Primary keys cannot be used as unique constraints and are used only for indexing and merge operations. | Hologres requires standard database primary keys. A primary key can be used as a unique constraint. | |
Update performance | Incomplete and weak support. ClickHouse does not support frequent updates based on primary keys. | Complete. Hologres supports frequent updates based on primary keys. | |
Statement for real-time data writing | APPEND |
|
|
Indexing |
|
Note Indexes such as minmax, Bloom filter, and N-gram are automatically created and transparent
to users.
|
|
Query | Optimizer | Rule-based optimizer (RBO) | Cost-based optimizer (CBO) |
Federated query | Supported. ClickHouse allows you to perform federated queries based on the Hadoop Distributed File System (HDFS) and Kafka engines. | Supported. Hologres allows you to use a foreign data wrapper (FDW) to directly read MaxCompute and Hive data. | |
Pre-aggregation | Supported. Pre-aggregation is performed based on MergeTree. | Supported. Pre-aggregation is performed based on stored procedures and periodic scheduling. | |
Frequent point query | Not supported | Supported. More than 10 million point queries can be performed per second. | |
Performance of single-table complex query | Good | Good | |
Performance of joining multiple tables | Poor | Good | |
SQL syntax | Custom syntax | Hologres is compatible with the standard PostgreSQL syntax and provides more features. | |
Window function | Not supported | Supported | |
Transaction | Atomicity, consistency, isolation, durability (ACID) of transactions | ClickHouse does not ensure the ACID of transactions. For example, ClickHouse does not ensure that data can be viewed instantly after it is written. In addition, ClickHouse does not ensure data consistency. | Hologres ensures the ACID of transactions to a specific extent. For example, Hologres ensures the visibility of data for DDL transactions, single-row transactions, and snapshot-based transactions. |
Replication | Disaster recovery and backup | ClickHouse uses data replicas for disaster recovery and backup on remote Zookeeper and ClickHouse servers. | Hologres uses the binary logging feature to logically replicate data and uses the underlying mechanism to physically replicate data. |
Advanced feature | Binary logging | Not supported | Supported |
Vector search | Not supported | Supported | |
Spatial data | Not supported | Supported | |
Security management | ClickHouse allows you to customize permissions for security management. | Hologres is compatible with the Postgres permission model and allows you to perform various operations to manage permissions, configure IP address whitelists, and mask sensitive data for security management. | |
Separation between storage and computing | Storage and computing are not separated. The storage capacity on a single server is limited. | Storage and computing are separated. The storage capacity on a single server is almost unlimited. | |
Availability | Failovers must be manually handled. | Failovers can be automatically handled. | |
O&M | O&M is complex and the shard distribution must be manually maintained. | No O&M operations are required. | |
Ecosystem | Data connection | A variety of ecosystem partners such as Apache Kafka, Apache Flink, and Apache Spark are supported. | A variety of ecosystem partners such as Apache Flink, Apache Spark, Java Database Connectivity (JDBC), and DataX are supported. |
Business intelligence (BI) tool | A small number of BI tools such as Tableau and Superset are supported. | Hologres is compatible with the PostgreSQL ecosystem and supports more than 100 mainstream BI tools. |
Data type mappings
Category | ClickHouse | Hologres |
---|---|---|
Date | Date | Date |
DateTime | TIMESTAMPTZ | |
DateTime(timezone) | TIMESTAMPTZ | |
DateTime64 | TIMESTAMPTZ | |
Numeric value | Int8 | Hologres does not support the single-byte INT data type, but uses SMALLINT instead. |
Int16 | SMALLINT | |
Int32 | INT | |
Int64 | BIGINT | |
UInt8 | INT | |
UInt16 | INT | |
UInt32 | BIGINT | |
UInt64 | BIGINT | |
Float32 | FLOAT | |
Float64 | DOUBLE PRECISION | |
Decimal(P, S) | DECIMAL | |
Decimal32(S) | DECIMAL | |
Decimal64(S) | DECIMAL | |
Decimal128(S) | DECIMAL | |
Boolean value | ClickHouse does not support the BOOLEAN data type, but uses UInt8 instead. | BOOLEAN |
String | String | TEXT |
FixString(N) | Hologres does not support the FixString(N) data type, but uses TEXT instead. | |
LowCardinality | Hologres does not support the LowCardinality data type. Hologres automatically and
intelligently enables dictionary encoding and allows you to call the set_table_properties('x', 'dictionary_encoding_columns', 'col') function to build dictionary mappings for specific columns.
|
|
Binary value | ClickHouse does not support the BINARY data type, but uses String or FixString(N) instead. | Hologres supports data types such as BIT(n), VARBIT(n), BYTEA, and CHAR(n). |
Others | UUID | UUID |
Enum | Hologres does not support the Enum data type, but uses TEXT instead. | |
Nested, Tuple, and Array | ARRAY |
Migrate metadata
The migration of metadata involves the migration of DDL statements that are used to create tables. To migrate metadata, perform the following steps:
Migrate data
- Export the data to be migrated from the ClickHouse cluster as a file. Then, use the
COPY
command for JDBC or PostgreSQL to import the file to the Hologres instance. This method is recommended. - Use Flink or Spark jobs to read data from the ClickHouse cluster and write the data to the Hologres instance. For more information, see Write data from Apache Spark to Hologres.
- Use the Data Integration service of DataWorks or DataX to read data from the ClickHouse cluster and write the data to the Hologres instance. For more information, see Overview.
The following example shows how to export the data to be migrated from the ClickHouse cluster as a file and import the file to the Hologres instance. To use this method, perform the following steps:
Convert data query statements
The data query statements in Hologres use the PostgreSQL syntax, whereas those in ClickHouse use the custom syntax. The SQL syntax of ClickHouse is partially compatible with the American National Standards Institute (ANSI) SQL syntax. The SQL syntax of Hologres and that of ClickHouse are similar on the whole but have differences in details. Therefore, you must convert data query statements, especially for functions such as scalar functions and window functions in the data query statements.
- The names of columns in the SQL syntax of ClickHouse are enclosed by single quotation
marks (
''
), whereas those in the SQL syntax of Hologres are enclosed by double quotation marks (""
). - ClickHouse uses the
SELECT X FROM <database_name>.<table_name>
statement to query data, whereas Hologres uses theSELECT X FROM <schema_name>.<table_name>
statement to query data. - Expressions in data query statements differ for ClickHouse and Hologres, particularly
with respect to functions. The following table describes the mappings between specific
functions in ClickHouse and Hologres. Those functions shared by ClickHouse and Hologres
are not described.
ClickHouse Hologres toYear(expr) to_char(expr, 'YYYY') toInt32(expr) CAST(expr as INTEGER) - uniq()
- uniqCombined()
- uniqCombined64()
- uniqHLL12()
approx_count_distinct() uniqExact() count(distinct x) quantile(level) (expr) approx_percentile(level) WITH GROUP(ORDER BY expr) quantileExact(level) (expr) percentile_cont (level) WITH GROUP(ORDER BY expr)
- Use regular expressions
You can use regular expressions to convert specific ClickHouse SQL syntax with fixed patterns, such as function names, punctuation marks, and expressions, into the corresponding Hologres SQL syntax. For example, you can convert single quotation marks (
''
) in the ClickHouse SQL syntax to double quotation marks (""
) in the Hologres SQL syntax. - Use the ClickHouse extension
Hologres is embedded with the ClickHouse extension and is compatible with specific ClickHouse functions, such as the
toUInt32()
function. These functions do not need to be converted before they are used in Hologres.
- Example 1
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
-- Q1 on ClickHouse select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
- The following sample code shows the converted TPC-H query statement that is used to
query data in a Hologres instance:
-- Q1 on Hologres select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
- Example 2
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
-- Q4 on ClickHouse select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and o_orderdate in ( select o_orderdate from lineitem, orders where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
- The following sample code shows the converted TPC-H query statement that is used to
query data on a Hologres instance:
-- Q4 on Hologres select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
- Example 3
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
-- Q11 on ClickHouse select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * toDecimal32(0.0000010000,9) from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc limit 100;
- The following sample code shows the converted TPC-H query statement that is used to
query data on a Hologres instance:
-- Q11 on Hologres select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc limit 100;
- The following sample code shows a TPC-H query statement that is used to query data
in a ClickHouse cluster:
ClickHouse extension
Function | Description | Return type |
---|---|---|
toString(anyelement) | Converts a type to the TEXT type. | TEXT |
toInt64(anyelement) | Converts a type to the BIGINT type. | BIGINT |
toInt32(anyelement) | Converts a type to the INT type. | INT |
toDate(text) | Converts the TEXT type to the DATE type. | DATE |
toFloat64(anyelement) | Converts a type to the DOUBLE type. | INT |
- Use method
Extension functions are stored in the ClickHouse extension. To use extension functions, you must create an extension first. You can execute the following statements to use an extension function.Note Extensions can be created only by a superuser.
-- Create an extension. CREATE extension clickhouse; -- Use a function. SELECT <function> from tablename;
- Example
CREATE EXTENSION clickhouse;-- Create an extension. If an extension is available, skip this step. CREATE TABLE public.tb1 ( id bigint NOT NULL, data_date text ,PRIMARY KEY (id) ); INSERT INTO public.tb1 VALUES (1234,'20190102'); SELECT toString(id) from public.tb1; tostring ---------- 1234 (1 row) SELECT toDate(data_date) from public.tb1; todate ------------ 2019-01-02 (1 row)