Complete RAM authorization. For operating steps, refer to Role Authorization.
Select the region where the cluster will be created.
Click Create a cluster in the upper right corner.
Note: Clusters cannot be modified after creation, except for the name. Carefully confirm the necessary configurations during creation.
Follow these steps to create a cluster:
Step 1: Configure the software. As shown in the following figure, you need to select the E-MapReduce product version and necessary software.
Product version: The main version of E-MapReduce represents a complete open source software environment and can be upgraded regularly based on the upgrade of internal component software. If the software related to Hadoop is upgraded, E-MapReduce will be upgraded, and the main version will also be upgraded. An earlier version cluster cannot be upgraded to a later version.
Note: As of E-MapReduce Version 2.0.0, cluster types are no longer clearly distinguished. All clusters are Hadoop clusters, the only difference is internal service.
Cluster type:Currently E-MapReduce provides the following cluster types:
- Hadoop, Standard Hadoop cluster that contains most Hadoop-related components. Details about the components are provided in the selection list.
- Kafka, Independent Kafka cluster that provides Message Service.
Inclusion configuration: All software component lists under the selected cluster type are presented, including name and version number. You can select different components as required. The selected component can start relevant service processes by default.
Note: The more components you select, the higher the requirement will be for your computer configuration. Otherwise, there may be insufficient resources to run these services.
Security mode: Whether to enable the Kerberos authentication function of the cluster.
Software configuration (optional): Hadoop, Spark, Hive, and other basic software in the cluster can be configured. For detailed instructions, refer to Software configuration.
Step 2: Configure the hardware. As shown in the following figure, you need to configure the hardware.
Note: To ensure a normal use of clusters, the public network IP address is activated by default during cluster creation.
Billing method: The billing method is consistent with ECS. Both Subscription and Pay-As-You-Go modes are supported. If Subscription mode is selected, you will need to select the duration.
- Purchase duration: You can select 1, 2, 3, 6, or 9 months, or 1, 2, or 3 years.
Cluster network configuration
Availability zone of the cluster: Select the availability zone where the cluster is located. The machine type and disk vary with the availability zone. There are several availability zones in each region. The availability zone belongs to different physical areas. If better network connectivity is required, we recommend that you select the same availability zone. However, the risk of cluster creation failure will be increased as the library of a single availability zone may be insufficient.
Network type: Classic network and Virtual Private Cloud (VPC) can be selected. VPC requires an additional subordinate VPC and subnet (vswitch). Click
Create VPC/subnet (VSwitch)on the page to enter the VPC console for creation. Then refresh the list to see the created VPC/subnet (vswitch). For detailed E-MapReduce VPC, refer to VPC.
Note: Classic network is not interoperable with VPC. The network type cannot be changed after purchase.
New security group: Generally for new users, there is no security group. Click “New security group” and input the security group name in “Security group name”.
Main security group: The security group that the cluster belongs to. Only the security group created by the user in E-MapReduce product is presented here. The security group cannot be created outside the E-MapReduce. To create a security group, select “New security group” and enter the security group name. The name can be composed of Chinese characters, letters, numbers, and special characters with a length limit between 2-64 characters.
Cluster node configuration
High availability cluster: After opening, Hadoop cluster has two masters to support the high availability of Resource Manager and Name Node. Originally, HBase cluster supported the high availability, but a core node serves as another node. If the high availability is opened, an independent master node will be used to support it, which is safer and more reliable. The default mode is non-high availability mode, and there is only one master node.
E-MapReduce node configuration: In consideration of practical effects, E-MapReduce selects some types.
- Master, The master instance node is mainly responsible for deploying control processes such as Resource Manager and Name Node.
- Core, The core instance node is mainly responsible for storing all data of the cluster. It is scalable on demand.
- Task, A pure computing node does not store data,but is used to adjust the computing capacity of the cluster.
Node configuration: Select different types of nodes.Different types of nodes have different application scenarios. You can select one type based on requirements.
Data disk type: The data disks used by a cluster node are ordinary cloud disks, high-efficiency cloud disks, and SSD cloud disks which may vary with machine type and region. If the user selects different regions, the drop-down box will present the disks that are supported by the regions. The data disk is set to release with the cluster release by default.
Data disk volume: The recommended minimum cluster volume of a single machine is 40G and the maximum is 8000G.
Instance quantity: The quantity of instances of all required nodes. A cluster requires at least 3 instances (the high availability cluster requires at least 4 instances, adding 1 master node). The maximum is 50. If more than 50 instances are required, contact us through a ticket.
Your configuration list and cluster cost is shown on the right side of the page. The presented price information varies with the type of payment. For Subscription cluster, the total expense is shown. For Pay-As-You-Go cluster, hourly expense is shown.
Step 3: Configure the basic cluster information. As shown in the following figure, you need to complete the configuration of basic cluster information.
- Cluster name: The cluster name may be 1-64 characters and can be composed of uppercase letters, lowercase letters, digits, hyphen “-“, and underscore “_”.
Operation log: The function for saving the operation log is turned on by default. In the default state, you can select the OSS directory location to save the operation log. You must activate OSS before using this function. Cost depends on the number of uploaded files. We recommend that you open the OSS log saving function, which helps in operation debugging and error screening.
Log path: OSS path for saving the log.
Central metadatabase: Provided by E-MapReduce to store all Hive metadata in the external database of the cluster. This function is recommended when the cluster uses OSS as the main storage.
- Service role: You can authorize E-MapReduce with this role to use other Alibaba Cloud services like ECS and OSS.
- ECS application role: This role allows your programs running on the E-MapReduce computing nodes to access cloud services like OSS without providing the AccessKey (AK) of Alibaba Cloud.E-MapReduce automatically applies for an on-demand AK to authorize the access. The AK permission is controlled by this role.
Logon password: Set the logon password at the master node. The logon password can be composed of letters, digits, and special characters with a length limit between 8-30 characters.
Bootstrap action (optional): You can execute the customized script before the cluster starts Hadoop. For detail instructions, refer to Bootstrap action.
After all valid information is inputted, the “Create” button will be activated. Verify the information, and click Create to create the cluster.
If it is a Pay-As-You-Go cluster, the cluster will be created immediately and you will be taken back to the Cluster List page where there is a cluster in “Creating Cluster” status. It will take several minutes to create the cluster. After creation, the cluster will be switched to Idle status.
For Subscription cluster, the cluster will not be created until the order is generated and paid.
If the cluster creation fails, the cluster list page shows “Cluster creation failed”. The reason for failure can be seen when you hover the cursor over the red exclamation point.