ApsaraDB for HBase allows you to choose different specifications, numbers, and disk types for master and core nodes. You can choose specifications and disk types based on your workload requirements, such as the queries per second (QPS), storage capacity, response latency, and stability. You can size your cluster as follows:
- Determine the specification of the master nodes.
- Determine the specification and number of core nodes.
- Determine the size and type of disks.
For more information about how to select an ApsaraDB for HBase version, see ApsaraDB for HBase versions.
Node specification families
- Dedicated nodes: Dedicated resources are allocated to nodes to ensure their stability. If your workloads require a low response latency, use dedicated nodes and standard SSDs.
- General nodes: General nodes compete for resources. All general nodes have less than 8 vCPUs. If your cluster is used for production purposes, we recommend that you do not use general nodes.
No storage is attached to master nodes. By default, two master nodes (primary and backup) are used for disaster recovery in case of single point of failures. Master nodes are important. HBase masters, HDFS namenodes, and ZooKeeper are deployed on master nodes. If the master nodes do not have sufficient CPU or memory resources, the performance of your cluster is severely degraded.
|Number of core nodes||Master node specification|
|4 ≤ Number of core nodes ≤ 16||8c32g(recommended for small-sized clusters)|
|> 16||16c64g or above|
Note: The specification of master nodes also determines the numbers of tables and regions managed by the cluster. If the cluster manages a large number of tables or regions, select a high specification for the master nodes.
Core node specification: The minimum specification is 4c8g and the maximum specification is 32c128g.
Core nodes refer to RegionServers in HBase. You must select a core node specification based on the number of requests to be processed and the size of each request.
Note: The number of requests is not the only standard to size the core nodes. For example, your workload needs to process hundreds of requests per second. In most cases, you can select 4c8g core nodes to handle the requests. However, this rule is not applicable in the following cases: a row that stores kilobytes or megabytes of data is queried, a scan request contains complex filters, the cache hit rate is low, and the cluster manages a large number of tables or regions. In these cases, using 4c8g core nodes may affect the stability of your workload or increase the response latency.
The following table lists the recommended core node specifications for handling different
workloads. For more information about storage sizing, join the
ApsaraDB for HBase Q&A DingTalk group or submit a ticket.
|TPS + QPS||Recommended number and specification of core nodes||Suggestion|
|0 to 1,000||2 core nodes with 4c16g||The minimum specification recommended for handling light loads. We recommend that you do not deploy more than 600 regions on each core node.The minimum specification available is 4c8g. We recommend that you do not choose 4c8g because using only 8 GB of memory may cause out of memory errors when the load or KV store significantly grows.|
|1,000 to 20,000||2 to 3 core nodes with 8c32g||In comparison to 8c16g, 8c32g is more cost-effective. It offers an additional 16 GB of memory to guarantee the stability of your workloads. We recommend that you choose 8c32g to handle light and medium loads.|
|More than 20,000||8c32g, 16c32g, 32c64g, or higher||Specify the number of core nodes based on the actual number of requests. If your workload is deployed online, we recommend that select a specification with larger memory to increase the cache hit rate. If you need to run MapReduce or Spark tasks offline with heavy loads, or when the TPS or QPS is extremely high, select a specification with more CPU resources.|
Select a higher specification or add more core nodes
You can scale out your ApsaraDB for HBase cluster by adding more core nodes when the load spikes, response latency increases, or the cluster becomes unstable. However, hotspotting may occur if your workload in the cluster is not designed or served properly. The specification of a core node determines its capability to prevent hotspotting. Using low-specification core nodes and then simply adding more core nodes is not as efficient as using high-specification core nodes to guarantee the stability of your workload when the load spikes. For example, if large requests are directed to the nodes or the user traffic spikes in a region, the nodes with a low specification may be overloaded or run out of memory. As a result, the performance of your cluster is degraded.
We recommend that select a specification for your core nodes based on the requirements of your workload.
If the specification of your master or core nodes fails to meet the expected requirements,
you can upgrade the nodes. For more information about how to upgrade nodes, join the
ApsaraDB for HBase Q&A DingTalk group or submit a ticket.
ApsaraDB for HBase supports three types of storage: standard disks, local disks, and cold storage.
- Standard disks: Standard disks are scalable and highly reliable. We recommend that you use standard disks. They are replicated to ensure redundancy and can be expanded based on your needs. Unlike physical disks, standard disks are independent of hardware specifications, which can help you prevent data loss caused by physical damages. Standard disks include SSDs and ultra disks.
- Local disks: Local disks are physical disks. Local disks are charged at a lower price rate than standard disks. Local disks are unscalable. The specification of the core nodes that you choose determines the sizes of the local disks that you can choose. You cannot upgrade Elastic Compute Service (ECS) instances attached with local disks to get more storage space. When two or more physical disks are damaged, your workloads will be affected. Your data will not be lost if only one physical disk is damaged. The ApsaraDB for HBase support team will replace the damaged disk for you at the earliest opportunity. The startup costs for local disks are high. Local disks are more suitable for storing large amounts of data.
- Cold storage: Cold storage is dedicated to ApsaraDB for HBase. Cold storage is based on Object Storage Service (OSS). It helps you reduce data archiving costs. You can usecold storage to store infrequently accessed data or use thehot and cold data separationfeature to automatically archive data.
After you create a cluster, you can no longer change the storage type of the cluster. If you choose standard disks, you can increase the storage space by expanding the standard disks or adding more core nodes. If you choose local disks, you can increase the storage space only by adding more core nodes. Cold storage is an exception. You do not need to activate cold storage when you create a cluster. You can activate it and expand the storage space after the cluster is created.
|Feature||Storage type||Workload type|
|High performance||Standard SSDs or local SSDs||For online workloads that require extremely low response latency, such as advertising, recommendation, feed streaming, and user profiling. In this case, you can use SSDs to ensure a low response latency of 1 to 2 milliseconds and reduce noises. SSDs are suitable for users that require a low P99 latency. (P99 latency: 99% of the requests must be faster than the given latency.)|
|High efficiency||Ultra disks or local HDDs||For latency-sensitive online workloads. In most cases, if you use HDDs, the response latency will be around 10 milliseconds. However, it produces more noises than using SSDs.|
|Cold data storage||OSS (cold storage)||Near-line storage and data archiving. Using cold storage with standard/local disks can achieve almost the same write throughput as using standard/local disks The read throughput of cold storage is low, with a latency of tens of milliseconds. For more information, seeCold storage.|