All Products
Search
Document Center

E-MapReduce:Create a ClickHouse cluster

Last Updated:Aug 23, 2023

This topic describes how to create a ClickHouse cluster.

Background information

For information about the settings of instance types, memory, and disks, see Usage Recommendations.

Prerequisites

A virtual private cloud (VPC) and a vSwitch are created in the region where you want to create a ClickHouse cluster. For more information, see Create and manage a VPC and Create and manage a vSwitch.

Procedure

  1. Go to the cluster creation page.

    1. Log on to the E-MapReduce (EMR) console. In the left-side navigation pane, click EMR on ECS.

    2. Optional. In the top navigation bar, select the region where you want to create a cluster and select a resource group based on your business requirements.

      • You cannot change the region of a cluster after the cluster is created.

      • By default, all resource groups in your account are displayed.

    3. On the EMR on ECS page, click Create Cluster.

  2. Configure the cluster.

    When you create a cluster, you need to configure the software, hardware, and basic information, and confirm the order for the cluster.

    Important

    After a cluster is created, you cannot modify its parameters except for the cluster name. Make sure that all parameters are correctly configured when you create a cluster.

    1. Configure software parameters.

      Parameter

      Description

      Region

      The region where you want to create the cluster. You cannot change the region of a cluster after the cluster is created.

      Business Scenario

      Select Data Analytics.

      Product Version

      The version of EMR. By default, the latest version is selected.

      High Service Availability

      By default, this switch is turned off.

      Optional Services (Select One At Least)

      You need to select ClickHouse.

      Important
      • For clusters of EMR V5.11.0 or a later minor version, or EMR V3.45.0 or a later minor version, select ClickHouse. ZooKeeper is selected by default.

      • Clusters of EMR V5.8.0 to EMR V5.10.1: If you select only ClickHouse, the created cluster uses the built-in ClickHouse Keeper instead of ZooKeeper. The performance of ClickHouse Keeper is lower than that of ZooKeeper. We recommend that you also select ZooKeeper.

      • Clusters of EMR V3.42.0 to EMR V3.44.1:

        • If High Service Availability is turned on and ClickHouse is selected, ZooKeeper is selected by default.

        • If High Service Availability is not turned on and ClickHouse is selected, ZooKeeper is not selected by default. In this case, ClickHouse does not support DDL operations. We recommend that you also select ZooKeeper.

      Advanced Settings

      Custom Software Configuration: specifies whether to customize the configurations of software. You can use a JSON file to customize the configurations of the basic software required for a cluster. For more information, see Customize software configurations. By default, this switch is turned off.

    2. Configure hardware parameters.

      Parameter

      Description

      Billing Method

      The billing method of the cluster. Subscription is selected by default. EMR supports the following billing methods:

      • Pay-as-you-go: a billing method that allows you to pay for an instance after you use the instance. The system charges you for a cluster based on the hours the cluster is actually used. Bills are generated on an hourly basis at the top of every hour. We recommend that you use pay-as-you-go clusters for short-term test jobs or dynamically scheduled jobs.

      • Subscription: a billing method that allows you to use an instance only after you pay for the instance.

        Note

        We recommend that you create a pay-as-you-go cluster for a test run. If the cluster passes the test, you can create a subscription cluster for production.

      Zone

      The zone where you want to create a cluster. A zone in a region is a physical area with independent power supplies and network facilities. Clusters in zones within the same region can communicate with each other over an internal network. In most cases, you can use the zone that is selected by default.

      VPC

      The VPC where you want to deploy the cluster. If no VPC is available, click create a VPC to create one.

      vSwitch

      The vSwitch of the cluster. Select a vSwitch in the specific zone based on your business requirements. If no vSwitch is available in the zone, create one.

      Default Security Group

      The security group of the cluster. An existing security group is selected by default. For more information about security groups, see Overview.

      You can also click create a new security group to create a security group.

      Important

      Do not use an advanced security group that is created in the Elastic Compute Service (ECS) console.

      Node Group

      The node groups of the cluster. You can select instance types based on your business requirements. For more information, see Overview of instance families.

      • System Disk: You can select a standard SSD, enhanced SSD, or ultra disk based on your business requirements.

      • Disk size: You can adjust the size of the system disk based on your business requirements. Default value: 80. Valid values: 80 to 5000. Unit: GiB.

      • Data Disk: You can select standard SSDs, enhanced SSDs, or ultra disks based on your business requirements.

      • Disk size: You can adjust the size of the data disks based on your business requirements. Default value: 80. Valid values: 40 to 32768. Unit: GiB.

      • Instances:

        • If you turn off High Service Availability, one master node and one core node are configured by default.

        • If you turn on High Service Availability, three master nodes and three core nodes are configured by default.

      • Assign Public Network IP: specifies whether to associate an elastic IP address (EIP) with the cluster. By default, this switch is turned off.

        Note

        For information about how to apply for an EIP address, see Elastic IP addresses.

    3. Configure basic parameters.

      Parameter

      Description

      Cluster Name

      The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).

      Identity Credentials

      Key Pair (default): Use an SSH key pair to access the Linux instance.

      For information about how to use a key pair, see Overview.

      Password: Use the password that you set for the master node to access the Linux instance.

      The password must be 8 to 30 characters in length and must contain uppercase letters, lowercase letters, digits, and special characters.

      The following special characters are supported:

      ! @ # $ % ^ & *

      Application Configuration

      The replicas and shards of the ClickHouse cluster.

      Advanced Settings

      • ECS Application Role: You can assign an ECS application role to a cluster. Then, EMR applies for a temporary AccessKey pair when applications running on the compute nodes of the cluster access other Alibaba Cloud services, such as Object Storage Service (OSS). This way, you do not need to manually enter an AccessKey pair. You can grant the access permissions of the application role on specific Alibaba Cloud services based on your business requirements.

      • Bootstrap Actions: Optional. You can configure bootstrap actions to run custom scripts before a cluster starts Hadoop. For more information, see Manage bootstrap actions.

      • Resource Group: Optional. For more information, see Use resource groups.

  3. After you verify that the configurations are correct, read the terms of service, select the check box, and then click Confirm.

    Important
    • Pay-as-you-go clusters: The cluster is created immediately.

      After the cluster is created, the cluster is in the Running state.

    • Subscription clusters: An order is generated. The cluster will be created after you complete the payment.