All Products
Search
Document Center

ApsaraDB for HBase:Cold storage

Last Updated:Mar 06, 2024

ApsaraDB for HBase provides a new cold storage medium to store cold data. It provides equivalent write performance at one third the storage cost of ultra disks. You can query cold data in the cold storage at any time.

Background information

When you purchase an ApsaraDB for HBase cluster, you can select the cold storage medium as an additional storage space, and execute table creation statements to store cold data on the medium. In addition, ApsaraDB for HBase Performance-enhanced Edition allows you to separate cold data from hot data in the same table. The system can automatically store hot data in hot storage with fast read/write speed and store infrequently accessed data in cold storage to reduce costs.

Usage notes

  • The read Input/Output Operations Per Second (IOPS) of cold storage is low (up to 25 times/s per node), so cold storage is applicable to infrequent queries.

  • The write throughput of cold storage equals the throughput of the ultra disks that are used for hot storage.

  • Cold storage is not suitable for processing a large number of concurrent read requests. An error may occur if cold storage is used to process a large number of concurrent read requests.

  • If your purchased cold storage is extremely large, you can adjust the read IOPS based on your business requirements. You can submit a ticket to request technical support.

  • We recommend that you store no more than 30 TB cold data in each core node. To increase the storage capacity of each core node, you can submit a ticket for optimization suggestions.

Prerequisites

Cold storage is supported only on ApsaraDB for HBase Performance-enhanced Edition V2.1.8 and later. If your ApsaraDB for HBase Performance-enhanced Edition cluster is of a version earlier than V2.1.8, the cluster is automatically upgraded to the latest version when you activate cold storage for your cluster. The version of the client dependency AliHBase-Connector must be later than V1.0.7 or V2.0.7. The version of HBase Shell must be later than alihbase-2.0.7-bin.tar.gz.

Scenarios

Cold storage is applicable to various cold data scenarios such as data archiving and infrequently accessed data consumption.

Activate cold storage

Method 1: When you create an ApsaraDB for HBase Performance-enhanced Edition cluster, you can choose whether to purchase cold storage and the capacity of cold storage on the buy page. For more information, see Purchase a cluster.

Method 2:

  1. Log on to the ApsaraDB for HBase console.

  2. On the Clusters page, find the instance that you want to manage and click the instance ID.

  3. In the left-side navigation pane, click Cold Storage.

  4. Click Activate Now.

Warning
  • When cold storage is being activated, a jitter may occur when services are accessed. We recommend that you activate cold storage during off-peak hours.

  • Cold storage is supported only on ApsaraDB for HBase Performance-enhanced Edition V2.1.8 and later. If your ApsaraDB for HBase Performance-enhanced Edition cluster is of a version earlier than V2.1.8, the cluster is automatically upgraded to the latest version when you activate cold storage for your cluster.

Use cold storage

ApsaraDB for HBase Performance-enhanced Edition allows you to set storage properties based on column families. You can set the Storage parameter of a column family or all column families of a table to COLD. Then all data of this column family or all column families in the table is stored in cold storage and does not occupy the Hadoop Distributed File System (HDFS) space of the cluster. You can specify the property when you create a table or modify the property of the column family after you create a table.

You can use Java API or HBase Shell to create a table and modify the table properties. If you use the Java API, you must install the SDK for Java and configure the parameters first. For more information, see Use the HBase Java API to access ApsaraDB for HBase Performance-enhanced Edition clusters. If you use HBase Shell, follow the steps in Use HBaseue Shell to access an ApsaraDB for HBase Performance-enhanced Edition instance to download and configure HBase Shell.

Create a table that uses cold storage

HBase Shell

hbase(main):001:0> create 'coldTable', {NAME => 'f', STORAGE_POLICY => 'COLD'}

Java API

 Admin admin = connection.getAdmin();
 HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf("coldTable"));
 HColumnDescriptor cf = new HColumnDescriptor("f");
 cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD);
 descriptor.addFamily(cf);
 admin.createTable(descriptor);

Modify the table property to use cold storage

If you have created a table, you can modify the property of a column family in the table to use cold storage. If the column family contains data, the data is archived to cold storage only after a major compaction.

HBase Shell

hbase(main):011:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'COLD'}

Java API

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("coldTable");
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
// Set the storage type of the table to cold storage.
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD);
admin.modifyTable(tableName, descriptor);

Modify the table property to use hot storage

If the column storage type of the table is cold storage, you can change the type back to hot storage by changing the table property. If the column family contains data, the data is archived to hot storage after a major compaction.

HBase Shell

hbase(main):014:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'DEFAULT'}

Java API

// Create a connection.
Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("coldTable");
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
// Set the storage type of the table to the default storage. By default, hot storage is used.
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_DEFAULT);
admin.modifyTable(tableName, descriptor);

View the cold storage status

You can view the cold storage status on the Cold Storage page in the console and expand the capacity of the cold storage by clicking Cold Storage Scaling on the same page. Scale out cold storageYou can check the sizes of cold and hot data in a table on the User tables tab of the cluster management system.

Performance testing

Note

Runtime environment overview

  • Master: ecs.c5.xlarge, 4-core 8 GB memory, and a 20 GB ultra disk.

  • 4RegionServer: ecs.c5.xlarge, 4-core 8 GB memory, and a 20 GB ultra disk.

  • Test machine: ecs.c5.xlarge and 4-core 8 GB memory.

Write performance

Storage type

avg rt

p99 rt

Hot storage

1736 μs

4811 μs

Cold storage

1748 μs

5243 μs

Note

Each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB data. The system writes data in 16 parallel threads.

Random GET performance

Storage type

avg rt

p99 rt

Hot storage

1704 μs

5923 μs

Cold storage

14738 μs

31519 μs

Note

If you disable BlockCache, the system reads the data from the disk every time. Each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB of data. The system reads 1 KB of data in 8 parallel threads for each request.

Scan performance within a specified range

Storage type

avg rt

p99 rt

Hot storage

6222 μs

20975 μs

Cold storage

51134 μs

115967 μs

Note

If you disable BlockCache, each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB of data. The system reads 1 KB of data in 8 parallel threads for each request. You can set the Caching parameter to 30.