Alibaba Cloud E-MapReduce (EMR) provides a managed ClickHouse service based on open source ClickHouse. Open source ClickHouse is an online analytical processing (OLAP) engine. EMR ClickHouse supports all features of open source ClickHouse. EMR ClickHouse provides the following features that are developed based on Alibaba Cloud: quick deployment of clusters, cluster management, scaling, and monitoring and alerting. EMR ClickHouse also provides better read and write performance than open source ClickHouse and can be integrated with other EMR components in an efficient manner.

Features

Feature Description
Column-oriented storage Column-oriented storage provides better query performance than row-oriented storage. Column-oriented storage features a high data compression ratio that helps save storage space.
Massively parallel processing (MPP) architecture

Each node accesses only its own memory and disks. Nodes communicate with each other in parallel with independent data processing on each node. The MPP architecture provides excellent query performance and high scalability.

Vectorized engine: Data is processed by a column vector, which is a part of a column. Vectorized execution together with column-oriented storage improves CPU utilization.

Support for SQL ClickHouse supports a declarative query language based on SQL. The query language uses the American National Standards Institute (ANSI) SQL standards in many cases. ClickHouse supports GROUP BY, ORDER BY, FROM, JOIN, and IN queries, and non-correlated subqueries.
Real-time data update ClickHouse allows you to define a primary key in a table. Data is incrementally sorted and stored in a table engine of the MergeTree type. This way, you can efficiently query data based on primary keys.

ClickHouse supports near real-time data insertion, metric aggregation, and index creation.

Support for indexes Data can be sorted by primary key. In this case, specific values or data in specific value ranges can be extracted within dozens of milliseconds.

Scenarios

Scenario Description
User behavior analysis You can create a large wide table that can contain more than 1,000 columns to store data of a behavior analysis system. The number of types of JOIN operations that you can perform is limited. You can use join operations to perform path analysis, funnel analysis, and path conversion.
Traffic and monitoring You can use a streaming compute engine such as Flink or Spark Streaming to cleanse monitored data obtained based on system metrics and monitoring metrics of an application, write the cleansed data to ClickHouse in real time, and then display the results by using Grafana.
User Personas You can process data of various user features and create one or more feature tables that contain the feature data of all users. The tables can meet business requirements, such as flexible user persona analysis, advertising, and identification of specific users.
Real-time business intelligence (BI) reports You can efficiently generate BI reports that support flexible queries in real time based on your business requirements. This way, second-level queries can be performed and a majority of query requests are responded to in real time. BI reports can be order analysis, marketing effect analysis, and promotion activity analysis reports.
Note Inapplicable scenarios:
  • Complete transactions are not supported.
  • Data cannot be modified or deleted at a high frequency or a low latency.
  • Data can be modified or deleted only in batches.