This topic describes how to create a ClickHouse cluster in EMR on ECS and configure its software, hardware, and basic settings.
Background information
For recommendations on instance types, memory, and disk configurations for ClickHouse, see Usage Recommendations.
Prerequisites
You have created a virtual private cloud (VPC) and a vSwitch in the region where you want to create the cluster. For more information, see Create and manage a VPC and Create and manage a vSwitch.
Procedure
After a cluster is created, you can change only its name. All other configurations cannot be modified. Carefully review all configurations before you create the cluster.
Log on to the EMR on ECS console.
(Optional) In the top navigation bar, select a region and a resource group.
Region: The cluster is created in the selected region. The region cannot be changed after the cluster is created.
Resource Group: By default, all resources in your account are displayed.
Click Create Cluster.
Configure software settings.
Configuration item Description Region The region in which to create the cluster. The region cannot be changed after the cluster is created. Business Scenario Select Data Analytics. Product Version The latest EMR version is selected by default. High Service Availability Disabled by default. If enabled, EMR distributes master nodes across different underlying hardware to reduce the risk of failures. Optional Services (Select One At Least) Select ClickHouse. See the following table for ZooKeeper selection behavior by EMR version. Advanced Settings > Custom Software Configuration Specify a JSON file to configure the software of the cluster. For more information, see Configure custom software. Disabled by default. ZooKeeper selection behavior by EMR version
EMR version ZooKeeper behavior EMR 5.11.0 or later, or EMR 3.45.0 or later ZooKeeper is automatically selected when you select ClickHouse. EMR 5.8.0 to EMR 5.10.1 ClickHouse uses the built-in ClickHouse Keeper by default if you select only ClickHouse. However, ClickHouse Keeper has different performance characteristics from ZooKeeper. We recommend that you also select ZooKeeper. EMR 3.42.0 to EMR 3.44.1 If High Service Availability is enabled, ZooKeeper is automatically selected with ClickHouse. If High Service Availability is disabled, ZooKeeper is not selected by default. Without ZooKeeper, ClickHouse cannot perform DDL operations. You must manually select ZooKeeper. Configure hardware settings. Select instance types based on your workload requirements. For more information, see Instance families.
NoteUse the pay-as-you-go billing method for testing. After the test is complete, create a new subscription cluster for production use.
ImportantDo not use advanced security groups created in the ECS console. EMR supports only basic security groups.
Configuration item Description Billing Method Default: Subscription. Options: Pay-as-you-go (billed hourly based on actual usage, suitable for testing) or Subscription (prepaid for a specified duration, suitable for production). Zone Zones are independent physical locations within the same region. Interconnection between zones is supported. You can use the default zone. VPC Select an existing VPC. If no VPC is available, click Create VPC to create one. vSwitch Select a vSwitch in the zone of the selected VPC. If no vSwitch is available in the zone, create one. Default Security Group An existing security group is selected by default. You can also click Create Security Group to create a new one. For more information, see Security group overview. Node group settings
Parameter Description System Disk The disk type for the system disk. Options: Enterprise SSD (ESSD), standard SSD, or ultra disk. System Disk Size Default: 80 GB. Valid values: 80 GB to 5,000 GB. Data Disk The disk type for the data disk. Options: Enterprise SSD (ESSD), standard SSD, or ultra disk. Data Disk Size Default: 80 GB. Valid values: 40 GB to 32,768 GB. Instances The number of nodes. When High Service Availability is disabled: 1 master node and 1 core node (default). When enabled: 3 master nodes and 3 core nodes (default). Assign Public Network IP Specifies whether to assign an elastic IP address (EIP) to the cluster. Disabled by default. To access the cluster over the Internet after it is created, apply for an EIP on the ECS console. For more information, see Elastic IP Address. Configure basic settings.
Identity credential options:
Key Pair: Use an SSH key pair to log on to the cluster nodes. For more information, see SSH key pairs.
Password: Set a logon password for the master node. The password must be 8 to 30 characters in length and must contain uppercase letters, lowercase letters, digits, and special characters (
!@#$%^&*).
Configuration item Description Cluster Name The name of the cluster. The name must be 1 to 64 characters in length and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_). Identity Credentials The method used to log on to the cluster nodes: Key Pair (default) or Password. Application Configurations Configure the number of replicas and shards for ClickHouse. Advanced settings (optional):
Setting Description ECS Application Role When your program runs on EMR compute nodes, it can access other Alibaba Cloud services such as OSS without providing an AccessKey. EMR automatically requests a temporary AccessKey to authorize the access. The ECS Application Role controls the permissions of this temporary AccessKey. Bootstrap Actions Run custom scripts before Hadoop starts in the cluster. For more information, see Run a bootstrap action script. Resource Group Assign the cluster to a resource group. For more information, see Use resource groups. Review all configurations, select the Terms of Service, and click Confirm.
ImportantPay-as-you-go clusters: The cluster creation process starts immediately. After the cluster is created, its status changes to Running.
Subscription clusters: An order is generated. The cluster is created after you complete the payment.