All Products
Search
Document Center

MaxCompute:Data performance of Centauri

Last Updated:Dec 27, 2024

Centauri is the predecessor of Proxima CE. This topic describes the data performance of Centauri in different scenarios.

Scenario 1: Test on the doc table and query table that both contain 100 million data records of the BINARY data type with 512 dimensions

In this test scenario, the number of data records in both the doc table and query table is 100 million, the data type is BINARY, and the number of dimensions is 512. Fifty rows and four columns are manually specified for the search.

Test conclusion

The data performance of hash sharding of Proxima CE is approximately 20% higher than the data performance of Centauri.

-

Duration of the K-means phase (seconds)

Duration of the autotuning phase (seconds)

Duration of the build phase (seconds)

Duration of the seek phase (seconds)

Total time required (minutes)

Centauri

-

1,524

12,653

5,914

336

Hash sharding of Proxima CE

-

-

9,647

6,431

268

Note

K-means is a phase that is specific to the cluster sharding of Proxima CE and is used to obtain the cluster centroid table of the original doc table. Autotuning is a phase that is specific to Centauri and is used to calculate the values of the parameters of the indexing algorithm. Build is the index building phase. Seek is the search phase.

Test procedure

  • Performance comparison in the build phase

    • Centauri Centauri

    • Hash sharding of Proxima CEHash sharding of Proxima CE

    Result analysis: When Centauri is used, index building on one node is performed at an excessively high speed, and the remaining three nodes require approximately the same amount of time for index building. When hash sharding of Proxima CE is performed, index building on two nodes is performed at an excessively high speed, and index building on other two nodes is performed at a relatively low speed.

  • Performance comparison in the seek phase

    • CentauriCentauri

    • Hash sharding of Proxima CEHash sharding of Proxima CE

    Result analysis:

    • The time required for index seeking on the nodes for Centauri is close to the time required for index seeking on the nodes for hash sharding of Proxima CE.

    • The time required for result merging on the nodes for hash sharding of Proxima CE is approximately 12 minutes longer than the time required for result merging on the nodes for Centauri.

      • When hash sharding of Proxima CE is performed, the fastest result merging on a node is 8 minutes and the slowest result merging on a node is 20 minutes.

      • When Centauri is used, the fastest result merging on a node is 4 minutes and the slowest result merging on a node is 9 minutes.

  • Running details

    • Centauri

      Vector search  Data type:BINARY , Vector dimension:512 , Search method:graph , Measure:hamming , Building mode:build:seek
      Information about the doc table Table name: doc_table_pailitao_binary , Partition:20210712 , Number of data records in the doc table:100000000 , Vector delimiter:~
      Information about the query table Table name: doc_table_pailitao_binary , Partition:20210712 , Number of data records in the query table: 100000000 , Vector delimiter:~
      Information about the output table Table name: output_table_pailitao_binary_centauri , Partition:20210712
      Row and column information  Number of rows: 50 , Number of columns:4 , Number of data records in the doc table of each column for index building:25000000
      Whether to clear volume indexes:false
       
      
      Time required for each worker node (seconds):
      worker:TmpDataTableJoinWorker , times:0
      worker:TmpTableWorker , times:16
      worker:CleanUpWorker , times:4
      worker:AutotuningFastWorker , times:46
      worker:RowColWorker , times:53
      worker:SeekJobWorker , times:5914
      worker:BuildJobWorker , times:12653
      worker:AutotuningNormalWorker , times:1478
      Total time required (minutes):336
      
      Top recall rate User setting train:
      top200:0.95
      
      Top recall rate normal train:
      top200:98.061%
      
      Autotuning Fast Build Params:
      proxima.general.builder.memory_quota=0
      proxima.graph.common.max_doc_cnt=27500000
      proxima.general.builder.thread_count=15
      proxima.hnsw.builder.efconstruction=400
      proxima.graph.common.neighbor_cnt=100
      
      Autotuning Normal Search Params:
      proxima.hnsw.searcher.ef=400
      
      Sample commands:
      jar -resources  centauri-1.1.5.jar,libcentauri-1.1.5.so   -classpath /data/jiliang.ljl/centauri_1.1.5/centauri-1.1.5.jar
      com.alibaba.proxima.CentauriRunner
      -proxima_version 1.1.5
      -doc_table doc_table_pailitao_binary -doc_table_partition 20210712
      -query_table doc_table_pailitao_binary -query_table_partition 20210712
      -output_table output_table_pailitao_binary_centauri -output_table_partition 20210712
      -data_type binary -dimension 512 -app_id 201220 -pk_type int64 -clean_build_volume false -distance_method hamming -binary_to_int true -row_num 50 -column_num 4;
    • Hash sharding of Proxima CE

      Vector search  Data type:1 , Vector dimension:512 , Search method:hnsw , Measure:Hamming , Building mode:build:build:seek
      Information about the doc table Table name: doc_table_pailitao_binary2 , Partition:20210712 , Number of data records in the doc table:100000000 , Vector delimiter:~
      Information about the query table Table name: doc_table_pailitao_binary2 , Partition:20210712 , Number of data records in the query table:100000000 , Vector delimiter:~
      Information about the output table Table name: output_table_pailitao_binary_ce , Partition:20210712
      Row and column information  Number of rows: 50 , Number of columns:4 , Number of data records in the doc table of each column for index building:25000000
      Whether to clear volume indexes:false
      
      Time required for each worker node (seconds):
      SegmentationWorker:          2
      TmpTableWorker:              1
      KmeansGraphWorker:           0
      BuildJobWorker:              9647
      SeekJobWorker:               6431
      TmpResultJoinWorker:         0
      RecallWorker:                0
      CleanUpWorker:               3
      Total time required (minutes):268
      
      Sample commands:
      jar -resources proxima_ce_g.jar -classpath /data/jiliang.ljl/project/proxima2-java/proxima-ce/target/binary/proxima-ce-0.1-SNAPSHOT-jar-with-dependencies.jar  com.alibaba.proxima2.ce.ProximaCERunner
      -doc_table doc_table_pailitao_binary2 -doc_table_partition 20210712
      -query_table doc_table_pailitao_binary2 -query_table_partition 20210712
      -output_table output_table_pailitao_binary_ce -output_table_partition 20210712
      -data_type binary -dimension 512 -app_id 201220 -pk_type int64 -clean_build_volume false -distance_method Hamming -binary_to_int true -row_num 50 -column_num 4;

Scenario 2: Test on the doc table and query table that both contain 1 billion data records of the FLOAT data type with 128 dimensions

In this test scenario, the number of data records in both the doc table and query table is 1 billion, the data type is FLOAT, and the number of dimensions is 128. Fifty rows and sixty columns are specified for the search.

Test conclusion

The data performance of hash sharding of Proxima CE is approximately 30% higher than the data performance of Centauri. Compared with Centauri, cluster sharding of Proxima CE improves the data performance by approximately 2 times, and the data performance in the seek phase of cluster sharding is improved by approximately 7.5 times. INT8 quantization improves the data performance by approximately 10%.

Method

Cluster sharding or autotuning duration (seconds)

Duration of the build phase (seconds)

Duration of the seek phase (seconds)

Centauri

1,220

9,822

37,245

Hash sharding of Proxima CE

N/A

9,841

23,462

Hash sharding and INT8 quantization of Proxima CE

N/A

7,600

21,624

Cluster sharding of Proxima CE

1,247

14,404

5,028

Test procedure

  • Details in the build phase

    Method

    Mapper

    Build Reducer

    Total time required (seconds)

    Centauri

    -

    -

    -

    Hash sharding of Proxima CE

    00:01:23.116

    Latency:{min:00:00:03, avg:00:00:23, max:00:01:00}

    02:41:43.563

    Latency:{min:00:02:40, avg:01:32:33, max:02:41:33}

    9,841

    Hash sharding and INT8 quantization of Proxima CE

    00:01:36.166

    Latency:{min:00:00:09, avg:00:00:25, max:00:01:09}

    02:04:11.440

    Latency:{min:00:06:56, avg:01:06:06, max:02:03:53}

    7,600

    Cluster sharding of Proxima CE

    00:15:33.022

    Latency:{min:00:00:03, avg:00:03:24, max:00:15:21}

    03:43:37.529

    Latency:{min:00:03:57, avg:01:33:32, max:03:43:35}

    14,404

  • Details in the seek phase

    Method

    Mapper

    Topn Reducer

    Merge Reducer

    Total time required (seconds)

    Remarks

    Centauri

    00:15:45.000

    From 34 seconds to 11 minutes

    08:33:50.000

    From 98 minutes to 489 minutes

    01:30:20.000

    From 30 minutes to 70 minutes

    37,245

    • The overall data processing is complete 30 to 40 minutes after the reducer task finishes logging.

    • The single-node runtime of the mapper, topN reducer, and merge reducer tasks is obtained from another test in Logview.

    Hash sharding of Proxima CE

    00:06:29.791

    Latency:{min:00:00:02, avg:00:01:39, max:00:05:56}

    04:50:42.422

    Latency:{min:00:01:48, avg:01:54:33, max:03:47:54}

    04:50:42.422

    Latency:{min:00:00:35, avg:00:33:39, max:01:32:16}

    23,462

    • The total time consumed in the mapper and merge reducer tasks is similar to the time consumed in the max task. This result is close to the expectation. The time consumption is affected only by long-tail machines.

    • The failover on the two nodes that are last stopped in the topN reducer task starts late. If the time consumed by the two nodes is ignored, 1 hour can be reduced from the total time.

    Hash sharding and INT8 quantization of Proxima CE

    00:06:25.718

    Latency:{min:00:00:17, avg:00:01:27, max:00:06:02}

    03:58:00.566

    Latency:{min:00:00:25, avg:01:06:41, max:02:40:07}

    01:54:35.620

    Latency:{min:00:01:56, avg:00:20:54, max:01:39:55}

    21,624

    N/A.

    Cluster sharding of Proxima CE

    00:23:51.623

    Latency:{min:00:00:04, avg:00:03:01, max:00:08:34}

    01:00:38.382

    Latency:{min:00:05:15, avg:00:18:00, max:01:00:10}

    00:12:39.341

    Latency:{min:00:00:31, avg:00:07:08, max:00:12:33}

    5,028

    N/A.

Scenario 3: Cluster sharding on the doc table and query table that both contain 1.6 billion data records of the FLOAT data type with 128 dimensions

In this test scenario, the number of data records in both the doc table and query table is 1.6 billion, the data type is FLOAT, and the number of dimensions is 128. Data in rows and columns is automatically calculated.

A data amount of 1.6 billion data records in both the doc table and query table is excessively large. Only cluster sharding of Proxima CE can be successfully performed. The following table describes the basic data information.

Method

Cluster sharding or autotuning duration (seconds)

Duration of the build phase (seconds)

Duration of the seek phase (seconds)

Centauri

1,127

19,962

The task failed two times due to an out of memory (OOM) error.

Hash sharding of Proxima CE

N/A

14,637

The task failed once because the amount of output data exceeds the upper limit of the temporary table.

Cluster sharding of Proxima CE

5,478

17,911

6,801.