All Products
Search
Document Center

MaxCompute:Inner product and cosine distance

Last Updated:Apr 01, 2025

If the cosine distance is used as the distance for vector search of Proxima CE, the calculation of the cosine distance is closely related to an inner product. An inner product is widely used in search and recommendation scenarios. The inner product value is required in algorithmic operations for vector search. The calculation of inner products is optimized in Proxima CE and suitable for various index creation algorithm scenarios, such as Hierarchical Navigable Small World Graph (HNSW), Satellite System Graph (SSG), Hierarchical Clustering (HC), Graph Clustering (GC), Quantized Clustering (QC), and Linear Search. This topic describes how to calculate the inner product and cosine distance of vectors.

Prerequisites

Scenario

You can use MipsSquaredEuclidean or NormalizeConverter to calculate the inner product of vectors.

Scenario

Method

Original vectors cannot be normalized

Use MipsSquaredEuclidean.

Original vectors can be normalized

Use MipsSquaredEuclidean and NormalizeConverter. We recommend that you use MipsSquaredEuclidean.

MipsSquaredEuclidean

If the original vectors cannot be normalized, you can use MipsSquaredEuclidean to convert the dimensions of the original vectors. After the dimensions of the original vectors are converted, the Euclidean distance between the vectors is equivalent to the inner product of the vectors.

If you want to use MipsSquaredEuclidean, you must set the -distance_method parameter to MipsSquaredEuclidean. You can also configure the -measure_params parameter to specify related parameters. Each parameter is a single-line JSON string. The double quotation marks (") of the parameter do not need to be escaped. Spaces are not allowed in the configuration of each parameter. Example:

{"proxima.mips_euclidean.measure.injection_type":0}

For more information about the parameters, see IndexMeasure parameters.

Sample commands

Note

For details about the parameter configuration used in the following example code, see Reference: Proxima CE parameters.

--@resource_reference{"proxima-ce-aliyun-1.0.2.jar"}  -- Reference the JAR package of Proxima CE that you uploaded. Go to the DataStudio page in the DataWorks console. In the left-side navigation pane of the DataStudio page, click Scheduled Workflow. On the left side of the page that appears, choose Business Flow > MaxCompute > Resource. Right-click the JAR package of Proxima CE that you uploaded and select Insert Resource Path from the shortcut menu to display data as an annotation.
jar -resources proxima-ce-aliyun-1.0.2.jar  -- The JAR package of Proxima CE that you uploaded.
-classpath proxima-ce-aliyun-1.0.2.jar com.alibaba.proxima2.ce.ProximaCERunner  -- The classpath specifies the entry class of the main function.
-doc_table doc_table_xx  -- The input doc table.
-doc_table_partition 20221111  -- The partition of the input doc table.
-query_table query_table_xx  -- The input query table.
-query_table_partition 20221111  -- The partition of the input query table.
-output_table output_table_xx  -- The output table.
-output_table_partition 20221111  -- The partition of the output table.
-data_type float  -- The data type of the vector.
-dimension 8 -- The number of dimensions of the vector.
-external_volume_name xxx_volume_name -- The volume on OSS provided by the user must have the underlying OSS directory created in advance; otherwise, the operation will fail.
-owner_id 123456  -- The ID of the user.
-distance_method MipsSquaredEuclidean -- You must configure this parameter.
-measure_params {"proxima.mips_euclidean.measure.injection_type":0};  -- Optional.

NormalizeConverter

If the original vectors can be normalized, you can use NormalizeConverter to perform L2 normalization on the original vectors in the input doc or query table. After L2 normalization is performed, the inner product and the Euclidean distance have the following relationship: Formula. The conversion between the inner product and the Euclidean distance can be performed. In this case, the inner product can be used for distance calculation. After you use NormalizeConverter to perform normalization, you can configure the distance-related parameters to create indexes or run search tasks based on your business requirements.

If you want to use NormalizeConverter, you must set the -converter parameter to NormalizeConverter and set the -distance_method parameter to inner_product. By default, L2 normalization is performed. You can configure other parameters based on your business requirements for vector search. The search result is the inner product after the vector normalization.

Sample commands

--@resource_reference{"proxima-ce-aliyun-1.0.2.jar"}  -- Reference the JAR package of Proxima CE that you uploaded. Go to the DataStudio page in the DataWorks console. In the left-side navigation pane of the DataStudio page, click Scheduled Workflow. On the left side of the page that appears, choose Business Flow > MaxCompute > Resource. Right-click the JAR package of Proxima CE that you uploaded and select Insert Resource Path from the shortcut menu to display data as an annotation.
jar -resources proxima-ce-aliyun-1.0.2.jar  -- The JAR package of Proxima CE that you uploaded.
-classpath proxima-ce-aliyun-1.0.2.jar com.alibaba.proxima2.ce.ProximaCERunner  -- The classpath specifies the entry class of the main function.
-doc_table doc_table_xx  -- The input doc table.
-doc_table_partition 20221111  -- The partition of the input doc table.
-query_table query_table_xx  -- The input query table.
-query_table_partition 20221111  -- The partition of the input query table.
-output_table output_table_xx  -- The output table.
-output_table_partition 20221111  -- The partition of the output table.
-data_type float  -- The data type of the vector.
-dimension 8 -- The number of dimensions of the vector.
-external_volume_name xxx_volume_name -- The volume on OSS provided by the user must have the underlying OSS directory created in advance; otherwise, the operation will fail.
-owner_id 123456  -- The ID of the user.
-converter NormalizeConverter  -- The default normalized converter.
-distance_method inner_product;  -- You must configure this parameter.

Cosine distance

The SDK for Proxima does not support the calculation of cosine distance because this operation is costly. The cosine similarity of vectors is equivalent to the inner product after L2 normalization is performed on the vectors. Therefore, you can use Proxima CE to normalize vectors and then calculate the inner product of vectors or the Euclidean distance between vectors to improve performance. To calculate the cosine distance of vectors, perform the following steps:

  1. Use NormalizeConverter to obtain the inner product that is specified by the score parameter.

    Note

    score is a field in the output table.

  2. Calculate the cosine distance of vectors by using the following formula: 1 - score.

    The cosine similarity of vectors is equivalent to the inner product value (ip), which is equal to the value of the score parameter. The range of the cosine similarity of vectors is (-1,1). The cosine distance must be a positive value and is obtained by using the following formula: 1 - ip. The range of the cosine distance is (0,2).