Before creating an ApsaraDB for HBase cluster, select master node specifications, core node specifications and count, and a storage type. Match these to your workload's queries per second (QPS), transactions per second (TPS), data volume, response latency requirements, and stability needs.
You also need to decide on:
ApsaraDB for HBase edition. See ApsaraDB for HBase editions.
Elastic Compute Service (ECS) instance type: Use a dedicated ECS instance to allocate exclusive resources and avoid contention. For low-latency workloads, pair a dedicated instance with SSDs.
Choose master node specifications
Master nodes run ApsaraDB for HBase masters, Hadoop Distributed File System (HDFS) NameNodes, and ZooKeeper. They cannot store data. Two master nodes in primary-secondary mode are deployed by default for disaster recovery.
Insufficient CPU or memory on master nodes degrades the entire cluster. Select specifications based on the number of core nodes, tables, and HBase regions the cluster manages. Clusters with many tables or regions require higher-specification master nodes.
| Number of core nodes | Master node specification |
|---|---|
| Fewer than 4 | 4-core CPU, 8 GB memory |
| 4 to 7 | 8-core CPU, 16 GB memory (recommended for small clusters) |
| 8 to 15 | 8-core CPU, 32 GB memory |
| More than 16 | 16-core CPU, 64 GB memory or higher |
Choose core node specifications
Core nodes are the RegionServers in ApsaraDB for HBase. They handle all read and write requests, so their specifications directly affect throughput, latency, and cluster stability.
Specifications range: 4-core CPU with 8 GB memory (minimum) to 32-core CPU with 128 GB memory (maximum).
Key principle: Cache all metadata to achieve optimal performance. Choose core node memory based on how much data needs to stay in cache.
For different ApsaraDB for HBase clusters, the following examples for the selection of core node specifications are provided for reference:
Small clusters: 4-core CPU and 16 GB memory or 8-core CPU and 32 GB memory (recommended).
Medium or large clusters: Select core nodes based on the amount of data to be stored in memory.
Large amount of data to be stored: 16-core CPU and 64 GB memory or 32-core CPU and 128 GB memory.
Small amount of data to be stored: 16-core CPU and 32 GB memory or 32-core CPU and 64 GB memory.
If you need assistance in calculating the required storage space, join the ApsaraDB for HBase Q&A DingTalk group s0s3eg3 or submit a ticket.
Specifications by workload volume
| TPS + QPS | Recommended nodes | Notes |
|---|---|---|
| Fewer than 1,000 | Two nodes: 4-core CPU, 16 GB memory | Minimum for light loads. Keep regions per node under 600. The 4-core/8 GB option is available but avoid it — 8 GB memory risks out-of-memory errors when load or key-value store spikes. |
| 1,000 to 20,000 | Two or three nodes: 8-core CPU, 32 GB memory | More cost-effective than 8-core/16 GB. The extra 16 GB memory provides a stability buffer for light-to-medium loads. |
| More than 20,000 | 8-core/32 GB; 16-core/32 GB; 16-core/64 GB; 32-core/64 GB; 32-core/128 GB; or higher | For online workloads, choose large-memory nodes to maximize cache hit rate. For offline MapReduce or Apache Spark tasks with heavy CPU usage, prioritize higher core counts. |
When request volume alone is not enough
TPS and QPS are not the only factors. Even at a few hundred requests per second, upgrade core node specifications if your workload involves any of the following:
Rows that store kilobytes or megabytes of data
Scan requests with complex filters
Low cache hit rates, where each request reaches the disk
A large number of tables and regions
Under these conditions, 4-core/8 GB nodes can cause stability issues and increased latency despite low request volume.
Scale up or scale out
When load spikes, latency rises, or the cluster becomes unstable, you can scale out by adding core nodes. However, adding only low-specification nodes risks hotspotting if your workload is unevenly distributed. A core node's specifications determine its ability to absorb traffic spikes and prevent hotspotting.
For example, if a traffic surge hits a single region or a large request lands on one node, a low-specification node may become overloaded or run out of memory, affecting cluster stability.
Choose specifications based on workload requirements first, then adjust node count as needed.
To upgrade master or core nodes, join the ApsaraDB for HBase Q&A DingTalk group (s0s3eg3) or submit a ticket.
Choose a storage type
Storage type cannot be changed after a cluster is created. Choose carefully before creating the cluster.
ApsaraDB for HBase supports three storage types:
| Feature | Storage type | Best for |
|---|---|---|
| High performance | SSDs, local SSDs, or ESSDs | Online workloads requiring low latency: advertising, recommendation, feeds, and user profiling. Delivers 1–2 ms latency and minimizes performance jitter. SSDs are the best choice for low P99 (99th percentile) latency. |
| High efficiency | Ultra disks or local HDDs | Latency-sensitive online workloads where ~10 ms latency is acceptable. HDDs produce more performance jitter than SSDs. |
| Cold data storage | OSS (cold storage) | Near-line storage and data archiving. Write throughput matches hot storage, but read QPS is limited and read latency is around tens of milliseconds. |
Cloud disks
Cloud disks are scalable and replicated for redundancy. They are independent of physical hardware, which protects against data loss from hardware failure. Cloud disks include SSDs and ultra disks.
To increase storage capacity after cluster creation, expand existing cloud disks or add more core nodes.
Local disks
Local disks are physical disks attached to ECS instances. They cost less than cloud disks but are not independently scalable — the core node specification determines disk size.
To increase storage capacity, add more core nodes. You cannot expand capacity by upgrading the ECS instance.
If a single disk fails, data is not lost. If two or more physical disks are damaged, workloads are affected. The ApsaraDB for HBase support team replaces damaged disks promptly.
Local disks have high startup costs and suit workloads with large data volumes.
Cold storage
Cold storage is an Object Storage Service (OSS)-based tier dedicated to ApsaraDB for HBase. Use it alongside cloud disks to store infrequently accessed data, or enable hot and cold data separation to automatically archive cold data and reduce costs.
Unlike cloud disks and local disks, cold storage does not need to be selected at cluster creation. Enable and expand cold storage at any time after the cluster is created.