Splits a document into semantic chunks and optionally generates vector embeddings for each chunk. Use this API to prepare text for ingestion into an OpenSearch vector index as part of a retrieval-augmented generation (RAG) pipeline.
Endpoint
POST /v3/openapi/apps/{app_group_identity}/actions/knowledge-splitapp_group_identity is the name of your OpenSearch instance.
Request parameters
The request body is a SplitDoc object.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
content | String | Yes | — | The text to process. |
title | String | No | — | The document title. |
use_embedding | Boolean | No | false | Specifies whether to generate vector embeddings for each chunk. Set to true to enable vectorization. |
model | String | No | — | The vectorization model to be used. |
Example request
curl -X POST "https://<endpoint>/v3/openapi/apps/<app_group_identity>/actions/knowledge-split" \
-H "Content-Type: application/json" \
-d '{
"title": "Getting started with OSS",
"content": "Object Storage Service (OSS) is a cloud storage service provided by Alibaba Cloud. It allows you to store, access, and manage unstructured data such as images, videos, and documents.",
"use_embedding": true,
"model": "<embedding-model-name>"
}'Replace the following placeholders with actual values:
| Placeholder | Description |
|---|---|
<endpoint> | The OpenSearch service endpoint for your region |
<app_group_identity> | The name of your OpenSearch instance |
<embedding-model-name> | The embedding model configured for your instance |
Response parameters
Top-level fields
| Parameter | Type | Description |
|---|---|---|
request_id | String | The request ID. |
status | String | The request status. OK indicates success. |
errors | Array | A list of errors, if any. Empty on success. |
result | List\<ChunkContext\> | The list of chunks produced by the segmentation. |
ChunkContext fields
Each item in result is a ChunkContext object.
| Parameter | Type | Description |
|---|---|---|
chunk_id | String | The chunk ID. |
chunk | String | The chunk text. |
embedding | String | The vector after the vectorization, as a comma-separated list of floating-point values. |
type | String | The content type of the chunk. Valid values: text, image. |
img_url | String | The image URL. Returned only when type is image. |
Example response
{
"request_id": "111111111",
"status": "OK",
"errors": [],
"result": [
{
"chunk_id": "1",
"chunk": "Chunk 1",
"embedding": "-0.010441,-0.002826,-0.022911,0.000847,0.025610,0.019213,-0.019912,0.008210,0.011974,-0.010120,-0.003866,-0.008091,-0.006889,-0.034774,...,-0.012572,0.009668,0.010963,-0.005273,-0.005072,-0.002190,-0.001554,-0.000058",
"type": "text"
},
{
"chunk_id": "2",
"chunk": "Chunk 2",
"embedding": "-0.010441,-0.002826,-0.022911,0.000847,0.025610,0.019213,-0.019912,0.008210,0.011974,-0.010120,-0.003866,-0.008091,-0.006889,-0.034774,...,-0.012572,0.009668,0.010963,-0.005273,-0.005072,-0.002190,-0.001554,-0.000058",
"type": "image",
"img_url": "http://127.0.0.1"
},
{
"chunk_id": "3",
"chunk": "Chunk 3",
"type": "text"
}
]
}