All Products
Search
Document Center

OpenSearch:Billing methods and billable items

Last Updated:Oct 21, 2025

This topic describes the billable items, billing methods, and billing rules of the AI Search Open Platform.

Billable items

The AI Search Open Platform charges for the following items:

Note

You can activate the AI Search Open Platform for free. You are not charged if you do not use the service.

Billing methods

Except for the search engine service, all services on the AI Search Open Platform are billed on a pay-as-you-go basis. You are charged based on the number of service invocations and the billable hours, measured in Compute Units (CUs), consumed by custom model training. A bill is generated every hour. All hourly bills are consolidated into a single order, and the total fee is deducted from your Alibaba Cloud account.

Important

Starting from 17:00 on July 4, 2024, some services adopt tiered pricing. For more information, see the detailed billing rules in the following sections.

Billing rules

Model calling

In a large language model (LLM), a token is the smallest unit of text that the model can process and understand. A token usually represents a text segment such as a word, a phrase, a character, or a symbol. Different models may have their own chunking methods, and the number of characters may not correspond one-to-one with the number of tokens.

The billing unit for tokens used in model invocation on the AI Search Open Platform is USD/1,000 tokens. Some services adopt tiered pricing, and some services are billed based on input and output tokens.

Note

Some models support Token calculation to estimate the number of tokens generated by an invocation.

Tiered pricing example:

In the Germany (Frankfurt) region, if you call the sparse text embedding service and generate 1,000,000 tokens, which is equivalent to 1,000 billing units, the fee is calculated as follows: 500 × 0.001 + 500 × 0.0004 = 0.7 USD.

Input and output billing example:

In the Germany (Frankfurt) region, if you call the large language model qwen3-235b-a22b and generate 1,000 input tokens and 1,000 output tokens, the fee is calculated as follows:

1 × 0.0007 + 1 × 0.0028 = 0.0035 USD.

Germany (Frankfurt) region

Model name

Model ID

Billing unit

Price for 0-500 units

Price for units over 500

Document content parsing

ops-document-analyze-001

USD/1,000 tokens

0.0009

0.000272

USD/image

0.00073

USD/table

0.00157

Document content parsing - Extraction of hierarchical structure based on semantic understanding

Note

When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00052 USD/1,000 tokens for this feature.

USD/1,000 tokens

0.00052

Image text recognition

ops-image-analyze-ocr-001

USD/call

0.012

0.0031

Image content understanding

ops-image-analyze-vlm-001

USD/1,000 tokens

0.011

Document chunking

ops-document-split-001

USD/1,000 tokens

0.0009

0.000003

Text embedding

ops-text-embedding-001

USD/1,000 tokens

0.0009

0.000072

ops-text-embedding-002

0.0009

0.000054

ops-text-embedding-zh-001

0.0009

0.000022

ops-text-embedding-en-001

0.0009

0.000019

ops-gte-sentence-embedding-multilingual-base

0.0009

0.00003

ops-qwen3-embedding-0.6b

0.0009

0.000062

Multimodal embedding

Used for text and image embedding

ops-m2-encoder

Text embedding

USD/1,000 tokens

0.0009

0.000039

ops-m2-encoder

Image embedding

USD/image

0.0009

0.000032

ops-m2-encoder-large

Text embedding

USD/1,000 tokens

0.0009

0.000065

ops-m2-encoder-large

Image Vectorization

USD/image

0.0009

0.000042

ops-gme-qwen2-vl-2b-instruct

Text embedding

USD/1,000 tokens

0.0009

0.000162

ops-gme-qwen2-vl-2b-instruct

Image embedding

USD/image

0.0009

0.000146

Sparse text embedding

ops-text-sparse-embedding-001

USD/1,000 tokens

0.001

0.0004

Vector dimension reduction service

ops-embedding-dim-reduction-001

USD/doc

0.0009

0.0000064

Sorting service

ops-bge-reranker-larger

USD/doc

0.0005

0.000048

ops-text-reranker-001

0.0005

0.00016

ops-qwen3-reranker-0.6b

0.0005

0.000026

Video snapshot

ops-video-snapshot-001

USD/1,000 images

0.03

Speech recognition

ops-audio-asr-001

USD/hour

0.2

Search engine

Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Elasticsearch.

OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition.

LLM

qwen3-235b-a22b

USD/1,000 tokens

Input: 0.0007

Output: 0.0028

ops-qwen-turbo

Input: 0.000065

Output: 0.00026

qwen-turbo

Input: 0.00005

Output: 0.0002

qwen-plus

Input: 0.0004

Output: 0.0012

qwen-max

Input: 0.0016

Output: 0.0064

Query analysis

ops-query-analyze-001

USD/1,000 tokens

Input: 0.004

Output: 0.018

China (Shanghai) region

Model name

Model ID

Billing unit

Price for 0-500 units

Price for units over 500

Document content parsing

ops-document-analyze-001

USD/1,000 tokens

0.0007

0.00085

USD/image

0.0023

USD/table

0.005

Document content parsing - Extraction of hierarchical structure based on semantic understanding

Note

When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00031 USD/1,000 tokens for this feature.

USD/1,000 tokens

0.00031

Image text recognition

ops-image-analyze-ocr-001

USD/call

0.0112

0.0058

Image content understanding

ops-image-analyze-vlm-001

USD/1,000 tokens

0.0093

Document chunking

ops-document-split-001

USD/1,000 tokens

0.0007

0.000003

Text embedding

ops-text-embedding-001

USD/1,000 tokens

0.0007

0.000023

ops-text-embedding-002

0.0007

0.00007

ops-text-embedding-zh-001

0.0007

0.00001

ops-text-embedding-en-001

0.0007

0.000011

ops-gte-sentence-embedding-multilingual-base

0.0007

0.000025

ops-qwen3-embedding-0.6b

0.0007

0.000071

Multimodal embedding

Used for text and image embedding

ops-m2-encoder

Text embedding

USD/1,000 tokens

0.0007

0.000026

ops-m2-encoder

Image embedding

USD/image

0.0007

0.0000162

ops-m2-encoder-large

Text embedding

USD/1,000 tokens

0.0007

0.000067

ops-m2-encoder-large

Image embedding

USD/image

0.0007

0.000033

ops-gme-qwen2-vl-2b-instruct

Text embedding

USD/1,000 tokens

0.0007

0.00008

ops-gme-qwen2-vl-2b-instruct

Image vectorization

USD/image

0.0007

0.000072

Sparse text embedding

ops-text-sparse-embedding-001

USD/1,000 tokens

0.00084

0.00014

Vector dimension reduction service

ops-embedding-dim-reduction-001

USD/doc

0.0007

0.0000071

Sorting service

ops-bge-reranker-larger

USD/doc

0.00014

0.000013

ops-text-reranker-001

0.00014

0.000062

ops-qwen3-reranker-0.6b

0.00014

0.000015

Video snapshot

ops-video-snapshot-001

USD per 1,000 units

0.016

Speech recognition

ops-audio-asr-001

USD/hour

0.09

Search engine

Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Alibaba Cloud ES.

OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition.

LLM

qwen3-235b-a22b

USD/1,000 tokens

Input: 0.00056

Output: 0.0056

qwq-32b

Input: 0.00028

Output: 0.00084

ops-qwen-turbo

Input: 0.00006

Output: 0.0001

qwen-turbo

Input: 0.000042

Output: 0.000084

qwen-plus

Input: 0.000112

Output: 0.00028

qwen-max

Input: 0.000336

Output: 0.001344

deepseek-r1

Input: 0.00056

Output: 0.00224

deepseek-r1-distill-qwen-7b

Input: 0.00007

Output: 0.00014

deepseek-r1-distill-qwen-14b

Input: 0.00014

Output: 0.00042

deepseek-v3

Input: 0.00028

Output: 0.00112

Web search

  • The web search fee is calculated as follows: Invocation fee + Query rewrite fee

    By default, the qwen-max model is used for query rewriting during web search.
  • Usage: You can use web search in the following two ways:

    • Directly call the web search API.

    • Enable web search when you use an LLM.

USD/call

0.00735

USD/1,000 tokens

Query rewrite:

Input: 0.000336

Output: 0.001344

Query analysis

ops-query-analyze-001

USD/1,000 tokens

Intention recognition and alternate query extension:

  • Input: 0.001

  • Output: 0.004

Natural language to SQL (NL2SQL) generation:

  • Input: 0.00031

  • Output: 0.00078

Evaluation

The assessment module is used to comprehensively evaluate the retrieval-augmented generation (RAG) development process provided by the Open Platform for AI Search, from the user asking a question to the RAG system retrieving content and the LLM generating an answer.

USD/1,000 tokens

Input: 0.0007

Output: 0.0021

Model customization

Model name

Description

Price

Custom training for vector dimension reduction model

Custom training for the vector dimension reduction model lets you customize a vector dimension reduction model based on the vector data you provide. In actual business scenarios, you first use an embedding model to vectorize text or queries, and then use the vector dimension reduction model to further reduce the vector dimensions.

You are charged based on the number of CUs of computing resources consumed. The price of each CU is 0.5422614 USD.

The number of CUs consumed depends on the amount and dimension of the training data. For example, to train a model with 100,000 pieces of 1024-dimensional data, about 250 CUs are consumed, and the fee is 250 × 0.5422614 = 135.56535 USD.

Model deployment

Billing formula: CU price × CUs per instance type × Number of instances

The following table describes the billing rules.

Instance type

CU price (USD/hour)

CUs per machine

Price per machine (USD/hour)

gpu.v100.16g.x1

0.15

30.14

4.521

gpu.t4.16g.x1

16.07

2.4105

gpu.a10.24g.x1

11.01

1.6515

For example, if you purchase two gpu.a10.24g.x1 instances to deploy a model service, the fee is calculated as follows: 0.15 × 11.01 × 2 = 3.303 USD/hour.

Service development

After you start an instance, you are charged on a pay-as-you-go basis. The fee is calculated using the following formula: CU price × CUs per instance type × Number of instances.

The following table describes the billing rules.

Instance type

CU price (USD/hour)

CUs per machine

Price per machine (USD/hour)

gpu.t4.16g.x1

0.15

16.07

2.4105

ops.basic1.gi.large

0.61

0.0915

For example, if you select one ops.basic1.gi.large instance, the fee is calculated as follows: 0.15 × 0.61 × 1 = 0.0915 USD/hour.

Precautions

  • Service names correspond one-to-one with service IDs and API parameters. For more information, see Service overview.

  • Bills are generated hourly based on your actual usage.

  • For services with tiered pricing, charges are calculated based on the usage in each tier.

  • For services that use 1,000 tokens as the billing unit, usage statistics may include decimals.

  • Make sure that your Alibaba Cloud account has no overdue payments to avoid service disruptions.

Billing example

Assume that in the Germany (Frankfurt) region, you call the document chunking service for 1,000 units. The fees are calculated as follows:

  • Fee for the first 500 units (inclusive): 0.0009 USD × 500 = 0.45 USD

  • Fee for the units over 500: 0.000003 USD × 500 = 0.0015 USD

  • Total fee: 0.45 + 0.0015 = 0.4515 USD

References

View billing details