All Products
Search
Document Center

MaxCompute:Converters

Last Updated:Mar 17, 2026

Proxima 2.X supports converters that transform vector data during index building. This page explains how to use converters — specifically INT8 data quantization and data normalization — to improve search performance at the cost of a small, predictable reduction in recall accuracy.

Available converters

Converter

Purpose

Primary tradeoff

Int8QuantizerConverter

Quantizes FLOAT vectors to INT8 format, reducing storage and improving search speed.

Recall decreases by 1% to 2%; search performance increases by approximately 10%.

NormalizeConverter

Normalizes vector values before index building.

No accuracy impact data available; refer to your dataset test results.

For a complete list of converters, see Index Converter.

Prerequisites

Before configuring a converter, complete the following steps:

Converter parameters

Use the following parameters to configure a converter when running the index-building job.

Parameter

Type

Required

Description

-converter

String

Yes

The name of the converter to use for index building. Valid values: Int8QuantizerConverter, NormalizeConverter.

-converter_params

JSON string

No

The configuration parameters for the converter. Provide as a single-line JSON string. Do not escape double quotation marks ("). Example for NormalizeConverter: {"proxima.normalize.reformer.forced_half_float":false}.

Important

Spaces are not allowed in the -converter_params value. A space anywhere in the JSON string causes the job to fail.

For the full set of converter parameter keys and valid values, see IndexConverter parameter configuration.

Command example

The following example runs an index-building job using Int8QuantizerConverter. For descriptions of all other parameters in the command, see Reference: Proxima CE parameters.

--@resource_reference{"proxima-ce-aliyun-1.0.2.jar"}  -- Reference the uploaded proxima-ce JAR package. In the left navigation pane, choose Business Flow > MaxCompute > Resources. Right-click the uploaded JAR package and select Reference Resource to generate this comment line.
jar -resources proxima-ce-aliyun-1.0.2.jar  -- The uploaded proxima-ce JAR package
-classpath proxima-ce-aliyun-1.0.2.jar com.alibaba.proxima2.ce.ProximaCERunner  -- The main function entry class
-doc_table doc_table_xx  -- The input doc table
-doc_table_partition 20221111  -- The partition name of the doc table
-query_table query_table_xx  -- The input query table
-query_table_partition 20221111  -- The partition name of the query table
-output_table output_table_xx  -- The output table
-output_table_partition 20221111  -- The partition name of the output table
-data_type float  -- The vector data type
-dimension 8  -- The vector dimension
-external_volume_name xxx_volume_name  -- The volume stored in an OSS bucket. The OSS directory at the underlying layer must be created before running this job, or the search task fails.
-owner_id 123456  -- The ID of the user
-converter Int8QuantizerConverter  -- The converter
-converter_params ""  -- Optional. The converter parameters as a single-line JSON string. Do not escape double quotation marks ("). Spaces are not allowed.
;  -- Do not omit the semicolon. It marks the end of the ODPS SQL statement.

Performance

The recall rate is the percentage of true nearest neighbors returned by a search. A higher recall rate indicates more accurate results.

INT8 quantization reduces vector storage and increases search speed, but introduces a small precision loss. In a test using a doc table and a query table that each contain 20 million data records of the FLOAT data type with 512 dimensions, the recall rate decreased from 99.0% to 98.2% after quantization — a reduction of approximately 1% to 2%. Search performance in the same test increased by approximately 10%.

The data in this test is for reference only.