Optimization of CK, ES and RediSearch solutions in tens of millions of data queries
ES page turning optimization and performance optimization | Nuggets ES billion-level data retrieval optimization, three-second return to break through the performance bottleneck | InfoQ RedisJson was born, and its performance crushed ES and Mongo | Nuggets
When encountering a business requirement during development, it is necessary to filter out data of no more than 10W from the tens of millions of pot data, and sort and break it up according to the configured weighting rules (similar to the fact that the commodity data under one category cannot be continuous. appears 3 times). The realization of the business requirements, design ideas and scheme optimization are introduced below, and the following scheme is designed for "querying 10W-level data in tens of millions of data"
Multithreading + CK page turning scheme ES scroll scan deep page solution ES + Hbase combination solution RediSearch + RedisJSON combination scheme
initial design The overall scheme is designed as
First, filter out the "target data" from the pot table according to the configured "filtering rules" According to the configured "sorting rule", sort the "target data" to get the "result data"
The technical solution is as follows
Run the derivative task every day, import the existing tens of millions of pot data (Hive table) into Clickhouse, and then use the CK table for data filtering. Build the filtering rules and sorting rules of the business configuration into a "Filter + Sort" object SelectionQueryCondition. When fetching "target data" from the CK pool table, enable multithreading, perform paging filtering, and store the obtained "target data" in the result list.
//page size default 5000 int pageSize = this.getPageSize(); //page number int pageCnt = totalNum / this.getPageSize() + 1;