This topic describes how to create and configure a ClickHouse cluster.
For more information about the settings of instance types, memory, and disks, see Usage Recommendations.
- Go to the cluster creation page.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where you want to create a cluster and
select a resource group based on your business requirements.
- The region of a cluster cannot be changed after the cluster is created.
- All resource groups within your account are displayed by default.
- Click Cluster Wizard in the Clusters section.
- Configure a cluster. To create a cluster, you must configure software parameters, hardware parameter, and basic parameters as guided by the wizard.Notice After a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.
- Configure software parameters.
Parameter Description Cluster Type Set the parameter to ClickHouse. EMR Version The major version of EMR. The latest version is selected by default. Required Services The default components required for a specific cluster type. After a cluster is created, you can start or stop components on the Cluster Management page. Advanced Settings Custom Software Settings: customizes software settings. You can use a JSON file to customize the parameters of the basic components required for a cluster, such as Hadoop, Spark, and Hive. For more information, see Customize software configurations. The switch is turned off by default.
- Configure hardware parameters.
Section Parameter Description Billing Method Billing Method Subscription is selected by default. EMR supports the following billing methods:
- Pay-As-You-Go: a billing method that allows you to pay for an instance after you use the instance. The system charges you for a cluster based on the hours the cluster is actually used. You are charged on an hourly basis. We recommend that you use pay-as-you-go clusters for short-term test jobs or dynamically scheduled jobs.
- Subscription: a billing method that allows you to use an instance only after you pay for the instance.
We recommend that you create a pay-as-you-go cluster for a test run. If the cluster passes the test, you can create a subscription cluster for production.
Network Settings Zone The zone where you want to create a cluster. Zones are different geographical areas located in the same region. They are interconnected by an internal network. In most cases, you can use the zone selected by default. Network Type The network type of the cluster. The VPC network type is selected by default. VPC The VPC where you want to deploy the cluster. Select a VPC in the same region as the zone. If no VPC is available in the region, click Create VPC/VSwitch to create a VPC. VSwitch The vSwitch of the cluster. Select a vSwitch in the specified zone. If no vSwitch is available in the zone, create a vSwitch. Security Group Name The security group of the cluster. An existing security group is selected by default. For more information about security groups, see Overview.
You can click Create Security Group and enter a security group name to create a security group.Notice Do not use an advanced security group that is created in the Elastic Compute Service (ECS) console.
Instance Learn More You can select an instance type based on your business requirements. For more information, see Instance families.
- System Disk Type: You can select an SSD, ESSD, or ultra disk based on your business requirements.
- Disk Size: You can resize a disk based on your business requirements. The recommended minimum disk size is 120 GB. Valid values: 60 to 500. Unit: GB.
- Data Disk Type: You can select an SSD, ESSD, or ultra disk based on your business requirements.
- Disk Size: You can resize a disk based on your business requirements. The recommended minimum disk size is 80 GB. Valid values: 40 to 32768. Unit: GB.
- ClickHouse Nodes: The default number of ClickHouse nodes is 4.
- Configure basic parameters. Configure parameters in the Basic Information section.Notice The following table describes all parameters. However, parameters in the Advanced Settings section are not supported. Do not configure the parameters in this section.
Section Parameter Description Basic Information Cluster Name The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_). Shard Count The number of shards. The value of this parameter cannot be changed.Note The number of shards is automatically calculated when you create a cluster. The number of shards is calculated by using the following formula:
Number of shards = Number of ClickHouse nodes ÷ Number of ClickHouse cluster replicasMake sure that the number of ClickHouse nodes is divisible by the number of ClickHouse cluster replicas. Otherwise, you cannot create a cluster.
Replica Count The number of ClickHouse cluster replicas. The default number of ClickHouse cluster replicas is 2.Note To ensure the high availability of the ClickHouse cluster, the number of ClickHouse cluster replicas must be at least 2. Assign Public IP Address Specifies whether an elastic IP address (EIP) is associated with the cluster. The switch is turned off by default.Note If you do not turn on Assign Public IP Address, you cannot access the web UIs of open source services on the Public Connect Strings page in the EMR console. Key Pair For information about how to use a key pair, see SSH key pair overview. Password The password used to log on to a master node. The password must be 8 to 30 characters in length and contain uppercase letters, lowercase letters, digits, and special characters.
The following special characters are supported: ! @ # $ % ^ & *
Advanced Settings Add User The user added to access the web UIs of open source big data software. Permission Settings The RAM roles that allow applications running in a cluster to access other Alibaba Cloud services. You can use the default RAM roles.
- EMR Role: The value is fixed as AliyunEMRDefaultRole and cannot be changed. This RAM role authorizes a cluster to access other Alibaba Cloud services, such as ECS and OSS.
- ECS Role: You can also assign an application role to a cluster. Then, EMR applies for a temporary AccessKey pair when applications running on the compute nodes of that cluster access other Alibaba Cloud services, such as OSS. This way, you do not need to manually enter an AccessKey pair. You can grant the access permissions of the application role on specific Alibaba Cloud services based on your business requirements.
Data Disk Encryption The switch is turned off by default.If you turn on Enable Encryption, data in all cloud disks that serve as the data disks of the ECS instances in the cluster is encrypted. By default, a service-managed key is used to encrypt your data. You can also use a user-managed key to encrypt your data.Notice You cannot encrypt data in local disks. Bootstrap Actions Optional. You can configure bootstrap actions to run custom scripts before a cluster starts Hadoop. For more information, see Manage bootstrap actions. Tag Optional. You can add a tag pair when you create a cluster or add a tag pair on the cluster details page after a cluster is created. For more information, see Manage and use tags. Resource Group Optional. For more information, see Use resource groups.Note The cluster configurations appear on the right side of the page when you configure parameters. After you complete the configurations, click Next: Confirm. You are directed to the Confirm step, in which you can confirm the configurations and the fee for the creation of your cluster. The fee varies based on the billing method.
- Configure software parameters.
- Read the terms of service and select E-MapReduce Service Terms.
- Click Create. Refresh the Cluster Management page to view the creation progress. When Status becomes Idle, the cluster is created.