How the Open Platform for AI Search is billed - OpenSearch

This topic describes the billable items, billing methods, and billing rules of the AI Search Open Platform.

Billable items

The AI Search Open Platform charges for the following items:

Model calling: You are charged for calling models, such as the Document Content Parsing Service, text embedding service, and sorting service.
Model customization (China (Shanghai) region only): You are charged for customizing models provided by the AI Search Open Platform with your own data, such as the custom training for the vector dimension reduction service.
Model deployment (China (Shanghai) region only): You are charged deployment and invocation fees when you deploy models from different sources on the AI Search Open Platform.
Service development (China (Shanghai) region only): The AI Search Open Platform integrates the PAI Distribution Switch (DSW) feature. You can use a Notebook on the platform to develop and execute services.

Note

You can activate the AI Search Open Platform for free. You are not charged if you do not use the service.

Billing methods

Except for the search engine service, all services on the AI Search Open Platform are billed on a pay-as-you-go basis. You are charged based on the number of service invocations and the billable hours, measured in Compute Units (CUs), consumed by custom model training. A bill is generated every hour. All hourly bills are consolidated into a single order, and the total fee is deducted from your Alibaba Cloud account.

Important

Starting from 17:00 on July 4, 2024, some services adopt tiered pricing. For more information, see the detailed billing rules in the following sections.

Billing rules

Model calling

In a large language model (LLM), a token is the smallest unit of text that the model can process and understand. A token usually represents a text segment such as a word, a phrase, a character, or a symbol. Different models may have their own chunking methods, and the number of characters may not correspond one-to-one with the number of tokens.

The billing unit for tokens used in model invocation on the AI Search Open Platform is USD/1,000 tokens. Some services adopt tiered pricing, and some services are billed based on input and output tokens.

Note

Some models support Token calculation to estimate the number of tokens generated by an invocation.

Tiered pricing example:

In the Germany (Frankfurt) region, if you call the sparse text embedding service and generate 1,000,000 tokens, which is equivalent to 1,000 billing units, the fee is calculated as follows: 500 × 0.001 + 500 × 0.0004 = 0.7 USD.

Input and output billing example:

In the Germany (Frankfurt) region, if you call the large language model qwen3-235b-a22b and generate 1,000 input tokens and 1,000 output tokens, the fee is calculated as follows:

1 × 0.0007 + 1 × 0.0028 = 0.0035 USD.

Germany (Frankfurt) region

Model name	Model ID	Billing unit	Price for 0-500 units	Price for units over 500
Document content parsing	ops-document-analyze-001	USD/1,000 tokens	0.0009	0.000272
		USD/image	0.00073
		USD/table	0.00157
Document content parsing - Extraction of hierarchical structure based on semantic understanding Note When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00052 USD/1,000 tokens for this feature.		USD/1,000 tokens	0.00052
Image text recognition	ops-image-analyze-ocr-001	USD/call	0.012	0.0031
Image content understanding	ops-image-analyze-vlm-001	USD/1,000 tokens	0.011
Document chunking	ops-document-split-001	USD/1,000 tokens	0.0009	0.000003
Text embedding	ops-text-embedding-001	USD/1,000 tokens	0.0009	0.000072
	ops-text-embedding-002		0.0009	0.000054
	ops-text-embedding-zh-001		0.0009	0.000022
	ops-text-embedding-en-001		0.0009	0.000019
	ops-gte-sentence-embedding-multilingual-base		0.0009	0.00003
	ops-gte-sentence-embedding-multilingual-base		0.0009	0.00003
	ops-qwen3-embedding-0.6b		0.0009	0.000062
Multimodal embedding Used for text and image embedding	ops-m2-encoder Text embedding	USD/1,000 tokens	0.0009	0.000039
	ops-m2-encoder Image embedding	USD/image	0.0009	0.000032
	ops-m2-encoder-large Text embedding	USD/1,000 tokens	0.0009	0.000065
	ops-m2-encoder-large Image Vectorization	USD/image	0.0009	0.000042
	ops-gme-qwen2-vl-2b-instruct Text embedding	USD/1,000 tokens	0.0009	0.000162
	ops-gme-qwen2-vl-2b-instruct Image embedding	USD/image	0.0009	0.000146
Sparse text embedding	ops-text-sparse-embedding-001	USD/1,000 tokens	0.001	0.0004
Vector dimension reduction service	ops-embedding-dim-reduction-001	USD/doc	0.0009	0.0000064
Sorting service	ops-bge-reranker-larger	USD/doc	0.0005	0.000048
	ops-text-reranker-001		0.0005	0.00016
	ops-text-reranker-001		0.0005	0.00016
	ops-qwen3-reranker-0.6b		0.0005	0.000026
Video snapshot	ops-video-snapshot-001	USD/1,000 images	0.03
Speech recognition	ops-audio-asr-001	USD/hour	0.2
Search engine	Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Elasticsearch.
Search engine	OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition.
LLM	qwen3-235b-a22b	USD/1,000 tokens	Input: 0.0007 Output: 0.0028
	ops-qwen-turbo		Input: 0.000065 Output: 0.00026
	qwen-turbo		Input: 0.00005 Output: 0.0002
	qwen-plus		Input: 0.0004 Output: 0.0012
	qwen-max		Input: 0.0016 Output: 0.0064
Query analysis	ops-query-analyze-001	USD/1,000 tokens	Input: 0.004 Output: 0.018

China (Shanghai) region

Model name	Model ID	Billing unit	Price for 0-500 units	Price for units over 500
Document content parsing	ops-document-analyze-001	USD/1,000 tokens	0.0007	0.00085
		USD/image	0.0023
		USD/table	0.005
Document content parsing - Extraction of hierarchical structure based on semantic understanding Note When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00031 USD/1,000 tokens for this feature.		USD/1,000 tokens	0.00031
Image text recognition	ops-image-analyze-ocr-001	USD/call	0.0112	0.0058
Image content understanding	ops-image-analyze-vlm-001	USD/1,000 tokens	0.0093
Document chunking	ops-document-split-001	USD/1,000 tokens	0.0007	0.000003
Text embedding	ops-text-embedding-001	USD/1,000 tokens	0.0007	0.000023
	ops-text-embedding-002		0.0007	0.00007
	ops-text-embedding-zh-001		0.0007	0.00001
	ops-text-embedding-en-001		0.0007	0.000011
	ops-gte-sentence-embedding-multilingual-base		0.0007	0.000025
	ops-qwen3-embedding-0.6b		0.0007	0.000071
Multimodal embedding Used for text and image embedding	ops-m2-encoder Text embedding	USD/1,000 tokens	0.0007	0.000026
	ops-m2-encoder Image embedding	USD/image	0.0007	0.0000162
	ops-m2-encoder-large Text embedding	USD/1,000 tokens	0.0007	0.000067
	ops-m2-encoder-large Image embedding	USD/image	0.0007	0.000033
	ops-gme-qwen2-vl-2b-instruct Text embedding	USD/1,000 tokens	0.0007	0.00008
	ops-gme-qwen2-vl-2b-instruct Image vectorization	USD/image	0.0007	0.000072
Sparse text embedding	ops-text-sparse-embedding-001	USD/1,000 tokens	0.00084	0.00014
Vector dimension reduction service	ops-embedding-dim-reduction-001	USD/doc	0.0007	0.0000071
Sorting service	ops-bge-reranker-larger	USD/doc	0.00014	0.000013
	ops-text-reranker-001		0.00014	0.000062
	ops-qwen3-reranker-0.6b		0.00014	0.000015
Video snapshot	ops-video-snapshot-001	USD per 1,000 units	0.016
Speech recognition	ops-audio-asr-001	USD/hour	0.09
Search engine	Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Alibaba Cloud ES.
Search engine	OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition.
LLM	qwen3-235b-a22b	USD/1,000 tokens	Input: 0.00056 Output: 0.0056
	qwq-32b		Input: 0.00028 Output: 0.00084
	ops-qwen-turbo		Input: 0.00006 Output: 0.0001
	qwen-turbo		Input: 0.000042 Output: 0.000084
	qwen-plus		Input: 0.000112 Output: 0.00028
	qwen-max		Input: 0.000336 Output: 0.001344
	deepseek-r1		Input: 0.00056 Output: 0.00224
	deepseek-r1-distill-qwen-7b		Input: 0.00007 Output: 0.00014
	deepseek-r1-distill-qwen-14b		Input: 0.00014 Output: 0.00042
	deepseek-v3		Input: 0.00028 Output: 0.00112
Web search	The web search fee is calculated as follows: Invocation fee + Query rewrite fee By default, the qwen-max model is used for query rewriting during web search. Usage: You can use web search in the following two ways: Directly call the web search API. Enable web search when you use an LLM.	USD/call	0.00735
Web search		USD/1,000 tokens	Query rewrite: Input: 0.000336 Output: 0.001344
Query analysis	ops-query-analyze-001	USD/1,000 tokens	Intention recognition and alternate query extension: Input: 0.001 Output: 0.004
Query analysis	ops-query-analyze-001	USD/1,000 tokens	Natural language to SQL (NL2SQL) generation: Input: 0.00031 Output: 0.00078
Evaluation	The assessment module is used to comprehensively evaluate the retrieval-augmented generation (RAG) development process provided by the Open Platform for AI Search, from the user asking a question to the RAG system retrieving content and the LLM generating an answer.	USD/1,000 tokens	Input: 0.0007 Output: 0.0021

Model customization

Model name

Description

Price

Custom training for vector dimension reduction model

Custom training for the vector dimension reduction model lets you customize a vector dimension reduction model based on the vector data you provide. In actual business scenarios, you first use an embedding model to vectorize text or queries, and then use the vector dimension reduction model to further reduce the vector dimensions.

You are charged based on the number of CUs of computing resources consumed. The price of each CU is 0.5422614 USD.

The number of CUs consumed depends on the amount and dimension of the training data. For example, to train a model with 100,000 pieces of 1024-dimensional data, about 250 CUs are consumed, and the fee is 250 × 0.5422614 = 135.56535 USD.

Model deployment

Billing formula: CU price × CUs per instance type × Number of instances

The following table describes the billing rules.

Instance type	CU price (USD/hour)	CUs per machine	Price per machine (USD/hour)
gpu.v100.16g.x1	0.15	30.14	4.521
gpu.t4.16g.x1		16.07	2.4105
gpu.a10.24g.x1		11.01	1.6515

For example, if you purchase two gpu.a10.24g.x1 instances to deploy a model service, the fee is calculated as follows: 0.15 × 11.01 × 2 = 3.303 USD/hour.

Service development

After you start an instance, you are charged on a pay-as-you-go basis. The fee is calculated using the following formula: CU price × CUs per instance type × Number of instances.

The following table describes the billing rules.

Instance type	CU price (USD/hour)	CUs per machine	Price per machine (USD/hour)
gpu.t4.16g.x1	0.15	16.07	2.4105
ops.basic1.gi.large	0.15	0.61	0.0915

For example, if you select one ops.basic1.gi.large instance, the fee is calculated as follows: 0.15 × 0.61 × 1 = 0.0915 USD/hour.

Precautions

Service names correspond one-to-one with service IDs and API parameters. For more information, see Service overview.
Bills are generated hourly based on your actual usage.
For services with tiered pricing, charges are calculated based on the usage in each tier.
For services that use 1,000 tokens as the billing unit, usage statistics may include decimals.
Make sure that your Alibaba Cloud account has no overdue payments to avoid service disruptions.

Billing example

Assume that in the Germany (Frankfurt) region, you call the document chunking service for 1,000 units. The fees are calculated as follows:

Fee for the first 500 units (inclusive): 0.0009 USD × 500 = 0.45 USD
Fee for the units over 500: 0.000003 USD × 500 = 0.0015 USD
Total fee: 0.45 + 0.0015 = 0.4515 USD

References

View billing details