Proxima CE is an offline vector search engine built on the Proxima 2.x kernel, developed by Alibaba DAMO Academy. It runs as MapReduce or Graph jobs inside MaxCompute, reading vector data from MaxCompute tables and writing search results back to MaxCompute tables. Use Proxima CE when you need large-scale batch vector search — including top K retrieval from millions of records, multi-category search, and cluster-sharded index queries — without managing a separate search infrastructure.
What Proxima CE supports
Data types
| Data type | Notes |
|---|---|
INT8 |
— |
FLOAT |
— |
BINARY |
Can be converted to INT32 using the binary_to_int parameter. See Optional parameters. |
Search methods
| Method | Full name | Default |
|---|---|---|
| HNSW | Hierarchical Navigable Small World | Yes |
| SSG | Satellite System Graph | — |
| HC | Hierarchical Clustering | — |
| GC | Graph Clustering | — |
| QC | Quantized Clustering | — |
| Linear search | — | — |
Distance calculation
Three distance methods are available via the distance_method parameter:
-
Squared Euclidean distance
-
Inner product
-
Hamming distance
For details, see Optional parameters.
Similarity threshold
Set a similarity threshold using the threshold_score parameter. If the value of a vector exceeds the specified threshold, the system filters out the vector. For details, see Optional parameters.
How it works
MaxCompute table (source data)
│
▼
Proxima CE — creates index, runs batch queries
(via MapReduce or Graph jobs)
│
▼
MaxCompute table (search results)
Proxima CE provides built-in executable JAR files to run in MaxCompute. Index files are stored in MaxCompute Volume storage (backed by an OSS external volume) and are reused across query tasks.
Prerequisites
Before you begin, make sure you have:
Required
-
A MaxCompute project. See Create a MaxCompute project.
-
A DataWorks workspace with the MaxCompute project added as a data source.
-
If you selected Participate in Public Preview of Data Studio when creating the workspace, bind compute resources by following Associate a compute resource with a workspace (Participate in Public Preview of Data Studio turned on).
-
Otherwise, follow Add a data source or register a cluster to a workspace.
-
-
The Volume feature activated and an external volume created. Proxima CE writes its index to Volume storage.
-
To activate the Volume feature, see Apply for trial use of new features. You receive a text message after activation. If Volume is not activated, jobs fail with:
FAILED: ODPS-0420095: Access Denied - Volumes is not allowed in project config. -
To create an external volume, see External volume operations.
-
Recommended
-
Create the external volume before you start. If you skip this step, you must provide
role_arnas a required startup parameter, which introduces security risks.
Usage notes
The external volume must be configured with an OSS internal endpoint, for example, oss-cn-beijing-internal.aliyuncs.com. For OSS internal endpoints by region, see Regions and endpoints.
Supported tools
| Tool | Supported platforms | Notes |
|---|---|---|
| odpscmd | Linux only | JAR files are compiled for Linux. Windows and macOS are not supported. |
| DataWorks | All platforms | Create ODPS MR nodes and run them with ODPS SQL scripts. |
Get started
-
Install the Proxima CE package — Set up the environment and configure Proxima CE. See Install the Proxima CE package.
-
Run a vector search — Choose a search scenario from the table below.
| Scenario | Key capability | Reference |
|---|---|---|
| Basic vector search | Top K retrieval from millions of records | Basic vector search |
| Multi-category search | Supports different-category query/doc tables and single-query-multiple-category scenarios | Multi-category search |
| Cluster sharding | Index by cluster shard to reduce compute and accelerate queries | Cluster sharding |
| Inner product and cosine distance | Inner-product and cosine distance search | Inner product and cosine distance |
| Converters | Improve performance and reduce index size (retrieval loss varies) | Converters |
References
Parameters and kernel modules
Test reports
Feature testing:
Performance testing:
FAQ and troubleshooting