In big data scenarios, tiered storage is frequently used to store cold and hot data separately. ApsaraDB for HBase provides a new cold storage medium to store cold data. It provides equivalent write performance at one third the storage cost of ultra disks. You can query cold data in the cold storage at any time. Cold storage is applicable to various cold data scenarios such as data archiving and infrequently accessed data consumption. Cold storage is easy to use and can reduce storage costs. When you purchase an ApsaraDB for HBase instance, yo can choose cold storage as an additional storage medium. Then, you can run table creation statements to store cold data in cold storage.

ApsaraDB for HBase Performance-enhanced Edition uses the cold storage technology to separate and store hot and cold data on the same table. You can use cold and hot data separation feature to store hot data in hot storage for efficient reads and writes, and store cold data in cold storage to minimize storage costs. For more information, see Cold and hot data separation.

Activate cold storage

You can buy cold storage independently and use it as an additional storage.

When you create a new ApsaraDB for HBase Performance-enhanced Edition instance, select cold storage and specify the cold storage capacity. active_cold_storage If you did not activate cold storage when you create an instance, click Cold Storage in the left-side navigation pane in the cluster console and click Enable Now to activate cold storage. menu_cold_storageNote: Only versions of ApsaraDB for HBase Performance-enhanced Edition later than 2.1.8 support cold storage. When you activate cold storage, the version of your instance is automatically updated to the latest version.

Use cold storage

Note: To use cold storage, you must upgrade ApsaraDB for HBase Performance-enhanced Edition to a version later than 2.1.8. The version of the client dependency alihbase-connector must be later than 1.0.7 or 2.0.7. The version of HBase Shell must be later than alihbase-2.0.7-bin.tar.gz.

ApsaraDB for HBase Performance-enhanced Edition allows you to set storage properties based on column families. You can set the Storage parameter of a column family or all column families of a table to COLD. Then all data of this column family or all column families in the table is stored in cold storage and does not occupy the Hadoop Distributed File System (HDFS) space of the cluster. You can specify the property when you create a table or modify the property of the column family after you create a table.

You can use Java API or HBase Shell to create a table and modify the table properties. If you use the Java API, you must install the SDK for Java and configure the parameters first. For more information, see Use the Java API to access ApsaraDB for HBase. If you use HBase Shell, you must download and configure HBase Shell first. For more information, see Use HBase Shell to access ApsaraDB for HBase.

Create a table that uses cold storage

HBase Shell

hbase(main):001:0> create 'coldTable', {NAME => 'f', STORAGE_POLICY => 'COLD'}
			

Java API


 Admin admin = connection.getAdmin();
 HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf("coldTable"));
 HColumnDescriptor cf = new HColumnDescriptor("f");
 cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD);
 descriptor.addFamily(cf);
 admin.createTable(descriptor);
			

Modify the table property to use cold storage

If you have created a table, you can modify the property of a column family in the table to use cold storage. If the column family contains data, the data is archived to cold storage after a major compaction.

HBase Shell

hbase(main):011:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'COLD'}
			
Java API
Admin admin = connection.getAdmin(); 
TableName tableName = TableName.valueOf("coldTable"); 
HTableDescriptor descriptor = admin.getTableDescriptor(tableName); 
HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); 
// Set the storage type of the table to cold storage. 
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD); 
admin.modifyTable(tableName, descriptor); 

Set the property of the table to hot storage. If you want to change the storage type from cold storage to hot storage, you can modify the table property. If the column family contains data, the data is archived to hot storage after a major compaction.

HBase Shell
java hbase(main):014:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'DEFAULT'} 
Java API

Admin admin = connection.getAdmin(); 
TableName tableName = TableName.valueOf("coldTable"); 
HTableDescriptor descriptor = admin.getTableDescriptor(tableName); 
HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); 
// Set the storage type of the table to the default type. By default, the storage type is hot storage. 
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_DEFAULT); 
admin.modifyTable(tableName, descriptor); 

View the cold storage status

You can view the cold storage status and expand the capacity of the cold storage on the Cold Storage page in the console. expand_cold_storge To check the cold storage usage and hot storage usage for a table, go to the Cluster Management System and click the User tables tab.

Performance testing

Environment requirements

Master: ECS. c5.xlarge, 4-core 8 GB memory, 20 GB ultra disk. 4 RegionServer: ECS. c5.xlarge, 4-core 8 GB memory, 20 GB ultra disk. 4 Test Machine: ECS. c5.xlarge, 4-core 8 G memory.

Write performance

Table type avg rt p99 rt
Hot tables 1736 us 4811 us
Cold tables 1748 us 5243 us

Note: Each data record includes 10 columns and has 100 bytes data stored in each column. This means that there is 1 KB data in each row. The system writes data in 16 parallel threads.

Random GET performance

Table type avg rt p99 rt
Hot tables 1704 us 5923 us
Cold tables 14738 us 31519 us

Note: If you disable BlockCache, the system reads the data from the disk every time. Each data record includes 10 columns and has 100 bytes data stored in each column. This means that there is 1 KB data in each row. The system reads 1 KB data for each request by using eight parallel threads.

Scan performance within a specified range

Table type avg rt p99 rt
Hot tables 6222 us 20975 us
Cold tables 51134 us 115967 us

Note: Disable the BlockCache of the table. Each data record includes 10 columns and has 100 bytes data stored in each column. This means that there is 1 KB data in each row. The system reads 1 KB data for each request by using eight parallel threads. Set the Caching parameter to 30.

Notes

1. The read Input/Output Operations Per Second (IOPS) of cold storage is low (up to 25 times/s per node), so cold storage is applicable to infrequent queries.

2. The write throughput of cold storage equals to the throughput of the ultra disks that are used for hot storage.

3. Cold storage cannot process a large number of concurrent read requests. An error may occur if cold storage is used to process a large number of concurrent read requests.

4. If you have purchased an extremely large cold storage space, you can adjust the read IOPS as needed. You can submit a ticket to request technical support.

5. We recommend that you store no more than 30 TB of cold data in each core node. To expand the storage capacity of each core node, you can submit a ticket.