ApsaraDB for HBase allows you to select different specifications, quantities, and disk types for master nodes and core nodes. You can select specifications and disk types based on your workload requirements, such as the queries per second (QPS), storage capacity, the number of read and write requests, response latency, and stability. You can configure your cluster by using the following methods:

  • Determine the specification of master nodes.
  • Determine the specification and number of core nodes.
  • Determine the size and type of disks.

For more information about how to select an ApsaraDB for HBase edition, see ApsaraDB for HBase editions.

  • Dedicated resources are allocated to dedicated nodes to ensure stability. If your workloads require a low response latency, use dedicated nodes and SSDs.
  • Each general-purpose node has fewer than eight CPU cores. General-purpose nodes compete for resources. If your cluster is used for production purposes, we recommend that you do not use general-purpose nodes.

Master nodes

Master nodes cannot be used for storage. By default, two master nodes in primary-secondary mode are used for disaster recovery if a single point of failure occurs. Master nodes are important. HBase masters, HDFS namenodes, and ZooKeeper are deployed on master nodes. If the master nodes do not have sufficient CPU or memory resources, the performance of your cluster is severely degraded.

Number of core nodes Master node specification
< 4 4c8g
4 ≤ The number of core nodes < 8 8-core 16 GB (recommended for small-sized clusters)
8 ≤ The number of core nodes < 16 8c32g
> 16 16-core 64 GB or higher

Note: You must select the specification of master nodes based on the numbers of core nodes, tables, and regions managed by the cluster. If the cluster manages a large number of tables or regions, select master nodes of high specifications.

Core nodes

Core node specification: The minimum specification is 4-core 8 GB and the maximum specification is 32-core 128 GB.

Core node specification: The minimum specification is 4-core 8 GB and the maximum specification is 32-core 128 GB.

Core nodes refer to RegionServers in ApsaraDB for HBase. You must select a core node specification based on the number of requests to be processed and the size of each request.

Note: The number of requests is not the only standard to determine the specification of core nodes. For example, your workload needs to process hundreds of requests per second. In most cases, you can select 4-core 8 GB core nodes to handle the requests. However, this rule does not apply to the following scenarios: A row that stores kilobytes or megabytes of data is queried. A scan request contains complex filters. The cache hit ratio is low. Each request is received by disks. The cluster manages a large number of tables and regions. In these scenarios, core nodes of the 4-core 8 GB specification may affect the stability of your workload and increase the response latency.

The following table lists the recommended core node specifications for handling different workloads. We recommend that you take full factors into account when you select the specification of core nodes. For more information about storage capacity, join the ApsaraDB for HBase Q&A DingTalk group or submit a ticket.
TPS+QPS Recommended number and specification of core nodes Suggestion
0 ~ 1000 2 x 4-core 16 GB core nodes The minimum specification recommended for handling light loads. We recommend that you do not deploy more than 600 regions on each core node. The minimum specification available in ApsaraDB for HBase is 4-core 8 GB. We recommend that you do not select 4-core 8 GB because 8 GB memory may cause out of memory errors when the load or key-value store surges.
1000 ~ 2w 2 or 3 x 8-core 32 GB core nodes In comparison to 8-core 16 GB, 8-core 32 GB is more cost-effective. It offers an additional 16 GB memory to guarantee the stability of your workloads. We recommend that you select 8-core 32 GB to handle light and medium loads.
More than 20,000 8-core 32 GB, 16-core 32 GB, 32-core 64 GB, or higher Select the number of core nodes based on the actual number of requests. If your workload is deployed online, we recommend that you select a specification that has large memory to increase the cache hit ratio. If you need to run MapReduce or Spark offline tasks that have heavy loads, or when the TPS or QPS is high, select a specification that has more CPU resources.

Select core nodes of a high specification or increase the number of core nodes

You can scale out your ApsaraDB for HBase cluster by adding core nodes when the load spikes, the response latency increases, or the cluster becomes unstable. However, hotspotting may occur if your workload in the cluster is not designed or served in a proper manner. The specification of a core node determines its capability to prevent hotspotting. If you only scale out your cluster by adding low-specification core nodes, this may affect the service stability when the load spikes. Therefore, we recommend that you select high-specification core nodes. For example, if large requests are directed to the nodes or the traffic spikes in a region, the low-specification nodes may be overloaded or run out of memory. As a result, the performance of your cluster is degraded.

We recommend that you select a specification for your core nodes based on the requirements of your workload.

If the specification of your master or core nodes fails to meet your expected requirements, you can upgrade the nodes. For more information, join the ApsaraDB for HBase Q&A DingTalk group or submit a ticket.

Storage types

ApsaraDB for HBase supports three storage types: standard disks, local disks, and cold storage.

  • Standard disks: Standard disks are scalable and reliable. We recommend that you use standard disks. They are replicated to ensure redundancy and can be expanded based on your needs. Unlike physical disks, standard disks are independent of hardware specifications. This prevents data loss caused by physical damages. Standard disks include SSDs and ultra disks.
  • Local disks: Local disks are physical disks. Local disks have a lower price than standard disks. Local disks are unscalable. The specification of the core nodes that you select determines the size of the local disks. To increase the storage capacity, you can add only more core nodes. You cannot upgrade Elastic Compute Service (ECS) instances that use local disks to increase the storage capacity. If two or more physical disks are damaged, your workloads are affected. The failure of a single disk does not cause data loss. The ApsaraDB for HBase support team is ready to replace the damaged disk for you at the earliest opportunity. The startup costs for local disks are high. Local disks are suitable for storing large amounts of data.
  • Cold storage: Cold storage is dedicated to ApsaraDB for HBase. Cold storage is based on Object Storage Service (OSS). You can use cold storage in conjunction with standard disks. You can use cold storage to store infrequently accessed data or use the hot and cold data separation feature to automatically archive data. This reduces data archiving costs.

After you create a cluster, you can no longer change the storage type of the cluster. If you select standard disks, you can increase the storage capacity by expanding the standard disks or adding more core nodes. If you select local disks, you can increase the storage capacity only by adding more core nodes. Cold storage is an exception. When you create a cluster, you do not need to activate cold storage. After a cluster is created, you can activate this feature and expand the storage capacity based on your needs.

Feature Storage type Workload type
High performance Standard SSDs, local SSDs, Standard ESSDs Online workloads that require a low response latency, such as advertising, recommendation, feeds, and user profiling. In these scenarios, you can use SSDs/ESSDs to ensure a low response latency of 1 to 2 milliseconds and reduce performance jitters. SSDs are suitable for users that require a low P99 latency. P99 latency: 99% of the requests must be faster than the given latency.
High efficiency Ultra disks or local HDDs Latency-sensitive online workloads. In most cases, if you use HDDs, the response latency is around 10 milliseconds. However, HDDs cause more performance jitters than SSDs.
Cold data storage OSS for cold storage Near-line storage and data archiving. Cold storage that uses standard or local disks can achieve almost the same write throughput as hot storage that uses standard or local disks. However, the QPS of cold data read is limited. The read latency is around tens of milliseconds.