The HBase API implementation of secondary indexes has known limitations and is not recommended. Use secondary indexes through Lindorm SQL instead.
ApsaraDB for HBase Performance-enhanced Edition provides native secondary indexes, letting you query table data on any column without performing a full table scan.
The problem with rowkey-only queries
HBase stores rows in binary order by rowkey (primary key). This makes row scans, prefix scans, and range scans fast and efficient—as long as your query is based on the rowkey.
When you need to query on a non-rowkey column, HBase has no direct path to the matching rows. Without a secondary index, the database either applies a filter to narrow the rowkey range or scans the entire table. A full table scan wastes I/O, increases response time, and scales poorly as data grows.
Approaches to multi-column queries
Before native secondary indexes were available, two common workarounds existed:
| Approach | How it works | Drawback |
|---|---|---|
| Manual index table | Create a separate table indexed by the query column | You must keep the index table in sync with the primary table on every write |
| External search engine (Solr, Elasticsearch) | Export data to an external cluster for indexing | Resource-intensive for common queries on a small number of columns |
ApsaraDB for HBase Performance-enhanced Edition addresses both drawbacks with native secondary indexes built in ApsaraDB for HBase, offering lower costs compared to external search engine solutions.
How native secondary indexes work
When you create a secondary index on a column, queries on that column resolve through the index rather than scanning all rows.
Native secondary indexes in ApsaraDB for HBase are designed for high throughput and large-scale data. The implementation has been used in production at Alibaba Group, including during Double 11 Shopping Festivals, for many years.
What's next
Lindorm SQL overview — the recommended way to create and use secondary indexes