All Products
Search
Document Center

Alibaba Cloud Model Studio:CreateIndex

Last Updated:Oct 20, 2025

Create a knowledge base of the document search type.

Operation description

  • Limits: This operation can create only knowledge base of the document search type. Data query and image Q&A types are not supported. Use the console instead.

  • Required permissions

    • RAM users: Must first obtain the API permissions of Model Studio (such as the AliyunBailianDataFullAccess policy, which includes the sfm:CreateIndex permission required), and become member of a workspace.
    • Alibaba Cloud account: Has the permission by default, and can call the operation directly.
  • Call method: We recommend using the latest version of the GenAI Service Platform SDK. The SDK encapsulates complex signature computational logic to simplify the call process.

  • What to do next: This operation only initializes knowledge base creation job. After that, call SubmitIndexJob to complete the creation. Otherwise, you will get an empty knowledge base. For more information about the sample code, see Knowledge base API guide.

  • Idempotence: This operation is not idempotent. If you call the operation for multiple times, you may create several knowledge bases with the same name. We recommend following a "query first, then create" logic.

Rate limit: Rate limiting will be triggered if you call this operation frequently. Do not exceed 10 times per second. If limiting is triggered, try again later.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
sfm:CreateIndexcreate
*All Resources
*
    none
none

Request syntax

POST /{WorkspaceId}/index/create HTTP/1.1

Request parameters

ParameterTypeRequiredDescriptionExample
WorkspaceIdstringYes

The workspace ID. The knowledge base will be created in this workspace. For more information, see How to use workspace.

ws_3Nt27MYcoK191ISp
NamestringYes

The name of the knowledge base. The name must be 1 to 20 characters in length, and can contain Chinese characters, letters, digits, underscores (_), hyphens (-), periods (.), and colons (:).

StructureTypestringYes

The type of the knowledge base. Valid values:

  • unstructured: The document search type.
Note After you create a knowledge base, its type cannot be changed. This operation does not support data query and image Q&A types. Use the console instead.
structured
EmbeddingModelNamestringNo

The embedding model used in the knowledge base. The embedding model converts the original input prompt and knowledge text into numerical embeddings for similarity comparison. The default and only model available is text-embedding-v2. It supports multiple languages including Chinese and English and normalizes the embedding results. For more information, see Embedding . Valid values:

  • text-embedding-v2

The default value is null, which means using text-embedding-v2.

text-embedding-v2
RerankModelNamestringNo

The re-ranking model used in the knowledge base. The re-rank model is a scoring system outside the knowledge base. It calculates the similarity score of the query and text chunks in the knowledge base and ranks them in descending order. Then, the model returns the top K chunks with the highest scores. Valid values:

  • gte-rerank-hybrid
  • gte-rerank

The default value is empty, which means using gte-rerank-hybrid.

Note If you need only semantic ranking, we recommend gte-rerank. If you need both semantic ranking and text matching features to ensure relevance, we recommend gte-rerank-hybrid.
gte-rerank-hybrid
RerankMinScoredoubleNo

The similarity threshold. Only chunks with a similarity score higher than this value can be recalled. This parameter is used to filter chunks returned by the re-rank model. Valid values: 0.01 to 1.00.

Default value: 0.01.

0.20
ChunkSizeintegerNo

The chunk size, which is the maximum number of characters in each chunk. Text exceeding this length may be truncated.

Valid values: 1 to 6000. Default value: 500.

Note If ChunkSize is set to a value less than 100, OverlapSize is required. Or, if you do not pass these two parameters, the system uses the default values of the two.
128
OverlapSizeintegerNo

The overlap size, which is the number of overlapping characters between two consecutive chunks. Valid values: 0 to 1024.

Default value: 100.

Note OverlapSize must be less than ChunkSize. Otherwise, chunking errors may occur.
16
SeparatorstringNo
Note This parameter is not available. Do not specify this parameter.
,
SourceTypestringNo
Note This parameter is required in the latest version of the SDK. Otherwise, when you call SubmitIndexJob, an error will occur: Required parameter(data_sources) missing or invalid.

The source of the imported data. Valid values:

  • DATA_CENTER_CATEGORY: The category type, that is to import all files in one or more specified categories in Application Data.
  • DATA_CENTER_FILE: The file type, that is to import one or more specified files in Application Data.
Note If set to DATA_CENTER_CATEGORY, CategoryIds is required. If set to DATA_CENTER_FILE, DocumentIds is required.
Note To create an empty knowledge base, you can use an empty category with no files: Set this parameter to DATA_CENTER_CATEGORY, and CategoryIds to the ID of an empty category.
DATA_CENTER_FILE
DocumentIdsarrayNo

The files to imported to the knowledge base. Specify the file IDs to import (up to 10,000 files). To add more files later, call SubmitIndexAddDocumentsJob.

stringNo

A file ID, which is the FileId returned by AddFile. You can also go to the Application Data page. Click the ID icon next to the file to get its ID.

file_9a65732555b54d5ea10796ca5742ba22_XXXXXXXX
CategoryIdsarrayNo

The files to imported to the knowledge base. Specify the category IDs. All files under the categories will be imported (up to 10,000 files). To add more files later, call SubmitIndexAddDocumentsJob.

stringNo

The category ID, which is the CategoryId returned by AddCategory. You can also go to the Application Data page. Click the ID icon next to the category to get its ID.

ca_hiu2383nf934j
TableIdsarrayNo
Note This parameter is not available. Do not specify this parameter.
stringNo
Note This parameter is not available. Do not specify this parameter.
DataSourceobjectNo
Note This parameter is not available. Do not specify this parameter.
CredentialIdstringNo
Note This parameter is not available. Do not specify this parameter.
CredentialKeystringNo
Note This parameter is not available. Do not specify this parameter.
DatabasestringNo
Note This parameter is not available. Do not specify this parameter.
EndpointstringNo
Note This parameter is not available. Do not specify this parameter.
RegionstringNo
Note This parameter is not available. Do not specify this parameter.
SubPathstringNo
Note This parameter is not available. Do not specify this parameter.
SubTypestringNo
Note This parameter is not available. Do not specify this parameter.
TablestringNo
Note This parameter is not available. Do not specify this parameter.
TypestringNo
Note This parameter is not available. Do not specify this parameter.
SinkTypestringYes

The vector storage type of the knowledge base. For more information, see Knowledge base. Valid values:

  • BUILT_IN: The vector data is hosted by Alibaba Cloud Model Studio.
  • ADB: AnalyticDB for PostgreSQL database. If you need advanced features, such as managing, auditing, and monitoring, we recommend ADB.
Note If you have not used AnalyticDB for AnalyticDB in Model Studio before, go to the Create Knowledge Base page, select ADB-PG as Vector Storage Type, and follow the instructions to grant permissions. If you specify ADB, the SinkInstanceId and SinkRegion parameters are required.
DEFAULT
SinkInstanceIdstringNo

The ID of the AnalyticDB for PostgreSQL instance. Required only when SinkType is set to ADB. Get the ID on the Instances page of AnalyticDB for PostgreSQL.

gp-bp321093j84
SinkRegionstringNo

The region of the AnalyticDB for PostgreSQL instance. Required only when SinkType is set to ADB. Call DescribeRegions to obtain the region list.

cn-hangzhou
Columnsarray<object>No
Note This parameter is not available. Do not specify this parameter.
objectNo
Note This parameter is not available. Do not specify this parameter.
ColumnstringNo
Note This parameter is not available. Do not specify this parameter.
source_column_name1
IsRecallbooleanNo
Note This parameter is not available. Do not specify this parameter.
true
IsSearchbooleanNo
Note This parameter is not available. Do not specify this parameter.
true
NamestringNo
Note This parameter is not available. Do not specify this parameter.
index_column_name1
TypestringNo
Note This parameter is not available. Do not specify this parameter.
string
DescriptionstringNo

The description of the knowledge base. The description must be 0 to 1,000 characters in length. This parameter is empty by default.

metaExtractColumnsarray<object>No

The metadata extraction configurations. Metadata refers to a set of additional attributes associated with unstructured data, which are integrated into text chunks in key-value pairs. For more information, see Knowledge base.

objectNo
KeystringNo

The metadata key. It must be 1 to 50 characters in length and must be English letters or underscores. If you specify this parameter, the Value and Type parameters are required.

author
ValuestringNo

The metadata value.

Tim
TypestringNo

The type of the metadata field. Valid values:

  • constant
  • variable
  • custom_prompt
  • regular
  • keywords

Enumerated value:

  • constant: constant extraction.
  • keywords: keyword extraction.
  • custom_prompt: LLM.
  • variable: variable extraction.
  • regular: regular expression.
constant
DescstringNo

The description of the metadata field. The description must be 0 to 1,000 characters in length, and can contain Chinese characters, letters, digits, underscores (_), hyphens (-), periods (.), and colons (:). This parameter is left empty by default.

AuthorName
EnableLlmbooleanNo

When set to true, the key and value of this metadata filed will participate in the generation process of the model, together with the chunk. Valid values:

  • true
  • false

Default value: false.

false
EnableSearchbooleanNo

When set to true, the key and value of this metadata filed will participate in the knowledge base retrieval, together with the chunk. Valid values:

  • true
  • false

Default value: false.

false
enableHeadersbooleanNo

Whether to treat the first row of all .xlsx and .xls files as headers and concatenate them into each text chunk. This prevents the models from mistakenly interpreting headers as regular data rows.

Note Enable this feature only when all imported files are in .xlsx or .xls format and contain headers. Otherwise, leave it disabled.

Valid values:

  • true
  • false

Default value: false.

false
chunkModestringNo
Note This parameter is not available. Do not specify this parameter.
regex
EnableRewritebooleanNo

Whether to enable rewriting for multi-turn conversations. Valid values:

  • true
  • false

Default value: true.

true
CreateIndexTypestringNo
Note This parameter is not available. Do not specify this parameter.

Response parameters

ParameterTypeDescriptionExample
object

Schema of Response

Codestring

The error code.

Forbidden
Dataobject

The data returned if the request is successful.

Idstring

The knowledge base ID, or IndexId, is a unique identifier for the knowledge base created.

Note Keep this ID. It is required for all subsequent API operations related to this knowledge base.
jkurxhju6b
Messagestring

The error message.

Invalid input, variable name is missing
RequestIdstring

The request ID.

17204B98-7734-4F9A-8464-2446A84821CA
Statusstring

The status code.

200
Successstring

Indications whether the request is successful. Valid values:

  • true
  • false
true

Examples

Sample success responses

JSONformat

{
  "Code": "Forbidden",
  "Data": {
    "Id": "jkurxhju6b"
  },
  "Message": "Invalid input, variable name is missing",
  "RequestId": "17204B98-7734-4F9A-8464-2446A84821CA",
  "Status": 200,
  "Success": true
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
No change history