Centauri is the predecessor of Proxima CE. This topic describes the data performance of Centauri in different scenarios.
Scenario 1: Test on the doc table and query table that both contain 100 million data records of the BINARY data type with 512 dimensions
In this test scenario, the number of data records in both the doc table and query table is 100 million, the data type is BINARY, and the number of dimensions is 512. Fifty rows and four columns are manually specified for the search.
Test conclusion
The data performance of hash sharding of Proxima CE is approximately 20% higher than the data performance of Centauri.
- | Duration of the K-means phase (seconds) | Duration of the autotuning phase (seconds) | Duration of the build phase (seconds) | Duration of the seek phase (seconds) | Total time required (minutes) |
Centauri | - | 1,524 | 12,653 | 5,914 | 336 |
Hash sharding of Proxima CE | - | - | 9,647 | 6,431 | 268 |
K-means is a phase that is specific to the cluster sharding of Proxima CE and is used to obtain the cluster centroid table of the original doc table. Autotuning is a phase that is specific to Centauri and is used to calculate the values of the parameters of the indexing algorithm. Build is the index building phase. Seek is the search phase.
Test procedure
Performance comparison in the build phase
Centauri

Hash sharding of Proxima CE

Result analysis: When Centauri is used, index building on one node is performed at an excessively high speed, and the remaining three nodes require approximately the same amount of time for index building. When hash sharding of Proxima CE is performed, index building on two nodes is performed at an excessively high speed, and index building on other two nodes is performed at a relatively low speed.
Performance comparison in the seek phase
Centauri

Hash sharding of Proxima CE

Result analysis:
The time required for index seeking on the nodes for Centauri is close to the time required for index seeking on the nodes for hash sharding of Proxima CE.
The time required for result merging on the nodes for hash sharding of Proxima CE is approximately 12 minutes longer than the time required for result merging on the nodes for Centauri.
When hash sharding of Proxima CE is performed, the fastest result merging on a node is 8 minutes and the slowest result merging on a node is 20 minutes.
When Centauri is used, the fastest result merging on a node is 4 minutes and the slowest result merging on a node is 9 minutes.
Running details
Centauri
Vector search Data type:BINARY , Vector dimension:512 , Search method:graph , Measure:hamming , Building mode:build:seek Information about the doc table Table name: doc_table_pailitao_binary , Partition:20210712 , Number of data records in the doc table:100000000 , Vector delimiter:~ Information about the query table Table name: doc_table_pailitao_binary , Partition:20210712 , Number of data records in the query table: 100000000 , Vector delimiter:~ Information about the output table Table name: output_table_pailitao_binary_centauri , Partition:20210712 Row and column information Number of rows: 50 , Number of columns:4 , Number of data records in the doc table of each column for index building:25000000 Whether to clear volume indexes:false Time required for each worker node (seconds): worker:TmpDataTableJoinWorker , times:0 worker:TmpTableWorker , times:16 worker:CleanUpWorker , times:4 worker:AutotuningFastWorker , times:46 worker:RowColWorker , times:53 worker:SeekJobWorker , times:5914 worker:BuildJobWorker , times:12653 worker:AutotuningNormalWorker , times:1478 Total time required (minutes):336 Top recall rate User setting train: top200:0.95 Top recall rate normal train: top200:98.061% Autotuning Fast Build Params: proxima.general.builder.memory_quota=0 proxima.graph.common.max_doc_cnt=27500000 proxima.general.builder.thread_count=15 proxima.hnsw.builder.efconstruction=400 proxima.graph.common.neighbor_cnt=100 Autotuning Normal Search Params: proxima.hnsw.searcher.ef=400 Sample commands: jar -resources centauri-1.1.5.jar,libcentauri-1.1.5.so -classpath /data/jiliang.ljl/centauri_1.1.5/centauri-1.1.5.jar com.alibaba.proxima.CentauriRunner -proxima_version 1.1.5 -doc_table doc_table_pailitao_binary -doc_table_partition 20210712 -query_table doc_table_pailitao_binary -query_table_partition 20210712 -output_table output_table_pailitao_binary_centauri -output_table_partition 20210712 -data_type binary -dimension 512 -app_id 201220 -pk_type int64 -clean_build_volume false -distance_method hamming -binary_to_int true -row_num 50 -column_num 4;Hash sharding of Proxima CE
Vector search Data type:1 , Vector dimension:512 , Search method:hnsw , Measure:Hamming , Building mode:build:build:seek Information about the doc table Table name: doc_table_pailitao_binary2 , Partition:20210712 , Number of data records in the doc table:100000000 , Vector delimiter:~ Information about the query table Table name: doc_table_pailitao_binary2 , Partition:20210712 , Number of data records in the query table:100000000 , Vector delimiter:~ Information about the output table Table name: output_table_pailitao_binary_ce , Partition:20210712 Row and column information Number of rows: 50 , Number of columns:4 , Number of data records in the doc table of each column for index building:25000000 Whether to clear volume indexes:false Time required for each worker node (seconds): SegmentationWorker: 2 TmpTableWorker: 1 KmeansGraphWorker: 0 BuildJobWorker: 9647 SeekJobWorker: 6431 TmpResultJoinWorker: 0 RecallWorker: 0 CleanUpWorker: 3 Total time required (minutes):268 Sample commands: jar -resources proxima_ce_g.jar -classpath /data/jiliang.ljl/project/proxima2-java/proxima-ce/target/binary/proxima-ce-0.1-SNAPSHOT-jar-with-dependencies.jar com.alibaba.proxima2.ce.ProximaCERunner -doc_table doc_table_pailitao_binary2 -doc_table_partition 20210712 -query_table doc_table_pailitao_binary2 -query_table_partition 20210712 -output_table output_table_pailitao_binary_ce -output_table_partition 20210712 -data_type binary -dimension 512 -app_id 201220 -pk_type int64 -clean_build_volume false -distance_method Hamming -binary_to_int true -row_num 50 -column_num 4;
Scenario 2: Test on the doc table and query table that both contain 1 billion data records of the FLOAT data type with 128 dimensions
In this test scenario, the number of data records in both the doc table and query table is 1 billion, the data type is FLOAT, and the number of dimensions is 128. Fifty rows and sixty columns are specified for the search.
Test conclusion
The data performance of hash sharding of Proxima CE is approximately 30% higher than the data performance of Centauri. Compared with Centauri, cluster sharding of Proxima CE improves the data performance by approximately 2 times, and the data performance in the seek phase of cluster sharding is improved by approximately 7.5 times. INT8 quantization improves the data performance by approximately 10%.
Method | Cluster sharding or autotuning duration (seconds) | Duration of the build phase (seconds) | Duration of the seek phase (seconds) |
Centauri | 1,220 | 9,822 | 37,245 |
Hash sharding of Proxima CE | N/A | 9,841 | 23,462 |
Hash sharding and INT8 quantization of Proxima CE | N/A | 7,600 | 21,624 |
Cluster sharding of Proxima CE | 1,247 | 14,404 | 5,028 |
Test procedure
Details in the build phase
Method
Mapper
Build Reducer
Total time required (seconds)
Centauri
-
-
-
Hash sharding of Proxima CE
00:01:23.116
Latency:{min:00:00:03, avg:00:00:23, max:00:01:00}
02:41:43.563
Latency:{min:00:02:40, avg:01:32:33, max:02:41:33}
9,841
Hash sharding and INT8 quantization of Proxima CE
00:01:36.166
Latency:{min:00:00:09, avg:00:00:25, max:00:01:09}
02:04:11.440
Latency:{min:00:06:56, avg:01:06:06, max:02:03:53}
7,600
Cluster sharding of Proxima CE
00:15:33.022
Latency:{min:00:00:03, avg:00:03:24, max:00:15:21}
03:43:37.529
Latency:{min:00:03:57, avg:01:33:32, max:03:43:35}
14,404
Details in the seek phase
Method
Mapper
Topn Reducer
Merge Reducer
Total time required (seconds)
Remarks
Centauri
00:15:45.000
From 34 seconds to 11 minutes
08:33:50.000
From 98 minutes to 489 minutes
01:30:20.000
From 30 minutes to 70 minutes
37,245
The overall data processing is complete 30 to 40 minutes after the reducer task finishes logging.
The single-node runtime of the mapper, topN reducer, and merge reducer tasks is obtained from another test in Logview.
Hash sharding of Proxima CE
00:06:29.791
Latency:{min:00:00:02, avg:00:01:39, max:00:05:56}
04:50:42.422
Latency:{min:00:01:48, avg:01:54:33, max:03:47:54}
04:50:42.422
Latency:{min:00:00:35, avg:00:33:39, max:01:32:16}
23,462
The total time consumed in the mapper and merge reducer tasks is similar to the time consumed in the max task. This result is close to the expectation. The time consumption is affected only by long-tail machines.
The failover on the two nodes that are last stopped in the topN reducer task starts late. If the time consumed by the two nodes is ignored, 1 hour can be reduced from the total time.
Hash sharding and INT8 quantization of Proxima CE
00:06:25.718
Latency:{min:00:00:17, avg:00:01:27, max:00:06:02}
03:58:00.566
Latency:{min:00:00:25, avg:01:06:41, max:02:40:07}
01:54:35.620
Latency:{min:00:01:56, avg:00:20:54, max:01:39:55}
21,624
N/A.
Cluster sharding of Proxima CE
00:23:51.623
Latency:{min:00:00:04, avg:00:03:01, max:00:08:34}
01:00:38.382
Latency:{min:00:05:15, avg:00:18:00, max:01:00:10}
00:12:39.341
Latency:{min:00:00:31, avg:00:07:08, max:00:12:33}
5,028
N/A.
Scenario 3: Cluster sharding on the doc table and query table that both contain 1.6 billion data records of the FLOAT data type with 128 dimensions
In this test scenario, the number of data records in both the doc table and query table is 1.6 billion, the data type is FLOAT, and the number of dimensions is 128. Data in rows and columns is automatically calculated.
A data amount of 1.6 billion data records in both the doc table and query table is excessively large. Only cluster sharding of Proxima CE can be successfully performed. The following table describes the basic data information.
Method | Cluster sharding or autotuning duration (seconds) | Duration of the build phase (seconds) | Duration of the seek phase (seconds) |
Centauri | 1,127 | 19,962 | The task failed two times due to an out of memory (OOM) error. |
Hash sharding of Proxima CE | N/A | 14,637 | The task failed once because the amount of output data exceeds the upper limit of the temporary table. |
Cluster sharding of Proxima CE | 5,478 | 17,911 | 6,801. |