The upsert operation inserts a new document into a collection if the ID does not exist. If the ID already exists, an update is performed instead.
If no Doc ID is specified, DashVector generates one automatically and returns it in the response.
Prerequisites
Before you begin, make sure that you have:
A cluster
An API key
The latest version of the DashVector SDK
API definition
Collection.upsert(
docs: Union[Doc, List[Doc], Tuple, List[Tuple]],
partition: Optional[str] = None,
async_req: False
) -> DashVectorResponseExamples
All examples use the following client and collection setup:
import dashvector
from dashvector import Doc
import numpy as np
# Replace with your actual credentials
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
# Use an existing collection named 'quickstart'.
# To create one, see: https://www.alibabacloud.com/help/en/vrs/latest/new-collection#hivl6
collection = client.get(name='quickstart')Replace the following placeholders with your values:
| Placeholder | Description |
|---|---|
YOUR_API_KEY | API key for authentication |
YOUR_CLUSTER_ENDPOINT | Cluster endpoint URL |
Upsert a single document
# Upsert using a Doc object
ret = collection.upsert(
Doc(
id='1',
vector=[0.1, 0.2, 0.3, 0.4]
)
)
assert ret
# Upsert using a tuple (shorthand)
ret = collection.upsert(
('2', [0.1, 0.1, 0.1, 0.1]) # (id, vector)
)Upsert a document with fields
# Upsert with predefined and schema-free fields
ret = collection.upsert(
Doc(
id='3',
vector=np.random.rand(4),
fields={
# Predefined fields (types must match the collection schema)
'name': 'zhangsan', 'weight': 70.0, 'age': 30,
# Schema-free fields (str, int, bool, or float)
'anykey1': 'str-value', 'anykey2': 1,
'anykey3': True, 'anykey4': 3.1415926
}
)
)
# Upsert with fields using a tuple
ret = collection.upsert(
('4', np.random.rand(4), {'foo': 'bar'}) # (id, vector, fields)
)Upsert multiple documents
# Batch upsert 10 documents using Doc objects
ret = collection.upsert(
[
Doc(id=str(i+5), vector=np.random.rand(4)) for i in range(10)
]
)
# Batch upsert 3 documents using tuples
ret = collection.upsert(
[
('15', [0.2, 0.7, 0.8, 1.3], {'age': 20}),
('16', [0.3, 0.6, 0.9, 1.2], {'age': 30}),
('17', [0.4, 0.5, 1.0, 1.1], {'age': 40})
] # List[(id, vector, fields)]
)
assert retUpsert documents asynchronously
Set async_req=True to submit the upsert as a non-blocking call. Call .get() on the returned future to retrieve the result.
# Asynchronously upsert 10 documents
ret_funture = collection.upsert(
[
Doc(id=str(i+18), vector=np.random.rand(4), fields={'name': 'foo' + str(i)}) for i in range(10)
],
async_req=True
)
# Block until the operation completes
ret = ret_funture.get()Upsert a document with a sparse vector
Pass a sparse_vector dictionary that maps indices to values:
ret = collection.upsert(
Doc(
id='28',
vector=[0.1, 0.2, 0.3, 0.4],
sparse_vector={1: 0.4, 10000: 0.6, 222222: 0.8}
)
)Request parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
docs | Union[Doc, List[Doc], Tuple, List[Tuple]] | - | One or more documents to upsert. Required. |
partition | Optional[str] | None | Target partition name. |
async_req | bool | False | Enable asynchronous mode. |
Tuple format:
When passing a tuple instead of a Doc object, elements must follow this order: (id, vector) or (id, vector, fields).
Field constraints:
Each field in a Doc object is a key-value pair where:
The key must be a
str.The value must be
str,int,bool, orfloat.If the key was predefined during collection creation, the value type must match the predefined type.
Non-predefined keys are schema-free and accept any of the supported value types.
Response parameters
The method returns a DashVectorResponse object:
| Parameter | Type | Description | Example |
|---|---|---|---|
code | int | Status code. 0 indicates success. See Status codes. | 0 |
message | str | Result message. | success |
request_id | str | Unique request identifier. | 19215409-ea66-4db9-8764-26ce2eb5bb99 |