All Products
Search
Document Center

MaxCompute:Install the Proxima CE package

Last Updated:Mar 26, 2026

Before running vector search tasks in MaxCompute, install Proxima CE by downloading the JAR package, uploading it as a MaxCompute resource, and creating the required input tables.

Prerequisites

Complete Preparations before proceeding.

Download the package

Download proxima-ce-aliyun-1.0.2.jar.

The package contains the Proxima CE executable JAR file. After you upload it to your MaxCompute project as a resource, Proxima CE tasks can call it directly.

Upload the JAR file as a MaxCompute resource

Upload the JAR file to your MaxCompute project using either DataWorks or the MaxCompute client (odpscmd). Both methods register the file as a callable resource in your project.

Option 1: DataWorks

  1. On the Data Development page of DataWorks, upload the JAR file as a JAR resource.

    Note

    For resources that are created or uploaded through the DataWorks visual interface:

    • If the resource has not been uploaded, you must select Upload to MaxCompute. If the resource has already been uploaded to MaxCompute, you must clear this checkbox. Otherwise, the upload fails.

    • If you select Upload to MaxCompute during the upload, the resource is stored in both DataWorks and MaxCompute. If you later delete the resource from MaxCompute by using a command, the resource in DataWorks still exists and can be viewed.

    • The resource name does not have to match the uploaded file name.

    image

  2. Commit and deploy the resource. Click the 提交 icon in the top toolbar of the resource configuration tab to commit the resource to the development environment.

    Note

    If a production task needs to use this resource, you must also deploy the resource to the production environment. For more information, see Deploy a task.

Option 2: odpscmd

Use the odpscmd command-line client to add the JAR file as a resource. See Add resources for the full command syntax.

Prepare input tables

Proxima CE requires two input tables for every vector search task:

  • Doc table — the base table that contains all the vectors to search through.

  • Query table — the table that contains the vectors you want to find nearest neighbors for.

Both tables share the same schema. Create them with the following SQL:

-- Create a doc table
CREATE TABLE doc_table_float_smoke(pk STRING, vector STRING <,category BIGINT>) PARTITIONED BY (pt STRING);

-- Create a query table
CREATE TABLE query_table_float_smoke(pk STRING, vector STRING <,category BIGINT>) PARTITIONED BY (pt STRING);

Table name constraints

Table names must meet the following requirements. If a table name violates a constraint, rename the table before running the task.

  • Names cannot contain tmp_. Tasks fail at runtime if this string appears in a table name or partition value.

  • Names must be 1–64 characters long. This limit applies to both table names and partition values.

Required fields

Both tables must contain the following fixed fields. The field names must match exactly.

FieldDescriptionData type
pkPrimary key value used in queries. Values can be strings (for example, 1.nid,2.nid,3.nid,...) or INT64 integers (for example, 123,456,789,...). If all values are INT64, specify BIGINT for the column and set the -pk_type startup parameter to INT64 to improve performance.STRING (default); BIGINT if all values are INT64
vectorVector field.STRING
categoryCategory identifier for multi-category search. Required only when running multi-category search tasks.BIGINT
ptPartition field.STRING

Input table examples

Doc table

pkvectorpt
id10\~1\~1\~520190322
id20\~1\~1\~220190322
id33\~2\~1\~120190322
.........

Query table

pkvectorpt
id80\~1\~1\~520190322
id90\~1\~1\~220190322
id103\~2\~1\~120190322
.........

Output table format

After a vector search task completes, Proxima CE automatically writes results to an output table in MaxCompute — do not create it manually. Specify the table name using the -output_table parameter in the Proxima CE code.

Output table name constraints

  • Names cannot contain a period (.). MaxCompute treats . as a special character and fails to parse table names containing it.

  • Names cannot contain tmp_. Tasks fail at runtime if this string appears in an output table name.

  • Names of output tables and names of partitions in output tables must be 1–64 characters long.

Output table fields

FieldDescriptionData type
pkPrimary key value from the query table. Values can be strings or INT64 integers (same format as the input pk field). If all values are INT64, specifying BIGINT for the column and setting -pk_type to INT64 improves performance.STRING (default); BIGINT if all values are INT64
knn_resultPrimary key value of the matching doc table record.STRING
scoreSimilarity score of the retrieved result. Results are always sorted in descending order of similarity scores, regardless of the distance algorithm used. For inner_product and mips_squared_euclidean algorithms, a larger distance means higher similarity, so score values are sorted descending. For all other distance algorithms, a smaller distance means higher similarity, so score values are sorted ascending — consistent with the Proxima2 kernel.STRING
categoryCategory identifier for multi-category search. Present only for multi-category search tasks.BIGINT
ptPartition field.STRING

Output table example

pkknn_resultscorept
id8id10.120190322
id8id20.220190322
id9id10.120190322
id9id30.320190322
............

What's next

With Proxima CE installed and your input tables ready, run your first vector search task. Choose a scenario based on your use case:

ScenarioDescriptionReference
Basic vector searchSearch for the top K nearest neighbors across millions of records.Basic vector search
Multi-category searchRun searches across multiple categories, including queries that span different category groups or belong to multiple categories.Multi-category search
Cluster shardingBuild indexes using cluster sharding to reduce computation volume and accelerate queries.Cluster sharding
Inner product and cosine distanceUse inner-product distance for similarity search.Inner product and cosine distance
ConvertersApply converters to reduce index size and improve performance. Retrieval accuracy varies by use case.Converters