This topic describes the billable items, billing methods, and billing rules of the AI Search Open Platform.
Billable items
The AI Search Open Platform charges for the following items:
Model calling: You are charged for calling models, such as the Document Content Parsing Service, text embedding service, and sorting service.
Model customization (China (Shanghai) region only): You are charged for customizing models provided by the AI Search Open Platform with your own data, such as the custom training for the vector dimension reduction service.
Model deployment (China (Shanghai) region only): You are charged deployment and invocation fees when you deploy models from different sources on the AI Search Open Platform.
Service development (China (Shanghai) region only): The AI Search Open Platform integrates the PAI Distribution Switch (DSW) feature. You can use a Notebook on the platform to develop and execute services.
You can activate the AI Search Open Platform for free. You are not charged if you do not use the service.
Billing methods
Except for the search engine service, all services on the AI Search Open Platform are billed on a pay-as-you-go basis. You are charged based on the number of service invocations and the billable hours, measured in Compute Units (CUs), consumed by custom model training. A bill is generated every hour. All hourly bills are consolidated into a single order, and the total fee is deducted from your Alibaba Cloud account.
Starting from 17:00 on July 4, 2024, some services adopt tiered pricing. For more information, see the detailed billing rules in the following sections.
Billing rules
Model calling
In a large language model (LLM), a token is the smallest unit of text that the model can process and understand. A token usually represents a text segment such as a word, a phrase, a character, or a symbol. Different models may have their own chunking methods, and the number of characters may not correspond one-to-one with the number of tokens.
The billing unit for tokens used in model invocation on the AI Search Open Platform is USD/1,000 tokens. Some services adopt tiered pricing, and some services are billed based on input and output tokens.
Some models support Token calculation to estimate the number of tokens generated by an invocation.
Tiered pricing example:
In the Germany (Frankfurt) region, if you call the sparse text embedding service and generate 1,000,000 tokens, which is equivalent to 1,000 billing units, the fee is calculated as follows: 500 × 0.001 + 500 × 0.0004 = 0.7 USD.
Input and output billing example:
In the Germany (Frankfurt) region, if you call the large language model qwen3-235b-a22b and generate 1,000 input tokens and 1,000 output tokens, the fee is calculated as follows:
1 × 0.0007 + 1 × 0.0028 = 0.0035 USD.
Germany (Frankfurt) region
Model name | Model ID | Billing unit | Price for 0-500 units | Price for units over 500 |
ops-document-analyze-001 | USD/1,000 tokens | 0.0009 | 0.000272 | |
USD/image | 0.00073 | |||
USD/table | 0.00157 | |||
Document content parsing - Extraction of hierarchical structure based on semantic understanding Note When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00052 USD/1,000 tokens for this feature. | USD/1,000 tokens | 0.00052 | ||
ops-image-analyze-ocr-001 | USD/call | 0.012 | 0.0031 | |
ops-image-analyze-vlm-001 | USD/1,000 tokens | 0.011 | ||
ops-document-split-001 | USD/1,000 tokens | 0.0009 | 0.000003 | |
ops-text-embedding-001 | USD/1,000 tokens | 0.0009 | 0.000072 | |
ops-text-embedding-002 | 0.0009 | 0.000054 | ||
ops-text-embedding-zh-001 | 0.0009 | 0.000022 | ||
ops-text-embedding-en-001 | 0.0009 | 0.000019 | ||
ops-gte-sentence-embedding-multilingual-base | 0.0009 | 0.00003 | ||
ops-qwen3-embedding-0.6b | 0.0009 | 0.000062 | ||
Used for text and image embedding | ops-m2-encoder Text embedding | USD/1,000 tokens | 0.0009 | 0.000039 |
ops-m2-encoder Image embedding | USD/image | 0.0009 | 0.000032 | |
ops-m2-encoder-large Text embedding | USD/1,000 tokens | 0.0009 | 0.000065 | |
ops-m2-encoder-large Image Vectorization | USD/image | 0.0009 | 0.000042 | |
ops-gme-qwen2-vl-2b-instruct Text embedding | USD/1,000 tokens | 0.0009 | 0.000162 | |
ops-gme-qwen2-vl-2b-instruct Image embedding | USD/image | 0.0009 | 0.000146 | |
ops-text-sparse-embedding-001 | USD/1,000 tokens | 0.001 | 0.0004 | |
ops-embedding-dim-reduction-001 | USD/doc | 0.0009 | 0.0000064 | |
ops-bge-reranker-larger | USD/doc | 0.0005 | 0.000048 | |
ops-text-reranker-001 | 0.0005 | 0.00016 | ||
ops-qwen3-reranker-0.6b | 0.0005 | 0.000026 | ||
ops-video-snapshot-001 | USD/1,000 images | 0.03 | ||
ops-audio-asr-001 | USD/hour | 0.2 | ||
Search engine | Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Elasticsearch. | |||
OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition. | ||||
qwen3-235b-a22b | USD/1,000 tokens | Input: 0.0007 Output: 0.0028 | ||
ops-qwen-turbo | Input: 0.000065 Output: 0.00026 | |||
qwen-turbo | Input: 0.00005 Output: 0.0002 | |||
qwen-plus | Input: 0.0004 Output: 0.0012 | |||
qwen-max | Input: 0.0016 Output: 0.0064 | |||
ops-query-analyze-001 | USD/1,000 tokens | Input: 0.004 Output: 0.018 | ||
China (Shanghai) region
Model name | Model ID | Billing unit | Price for 0-500 units | Price for units over 500 |
ops-document-analyze-001 | USD/1,000 tokens | 0.0007 | 0.00085 | |
USD/image | 0.0023 | |||
USD/table | 0.005 | |||
Document content parsing - Extraction of hierarchical structure based on semantic understanding Note When you call the Document Content Parsing Service, you can use a parameter to control whether to enable the feature of extracting document hierarchical structure based on semantic understanding. If this feature is enabled, in addition to the document parsing fee, you are charged 0.00031 USD/1,000 tokens for this feature. | USD/1,000 tokens | 0.00031 | ||
ops-image-analyze-ocr-001 | USD/call | 0.0112 | 0.0058 | |
ops-image-analyze-vlm-001 | USD/1,000 tokens | 0.0093 | ||
ops-document-split-001 | USD/1,000 tokens | 0.0007 | 0.000003 | |
ops-text-embedding-001 | USD/1,000 tokens | 0.0007 | 0.000023 | |
ops-text-embedding-002 | 0.0007 | 0.00007 | ||
ops-text-embedding-zh-001 | 0.0007 | 0.00001 | ||
ops-text-embedding-en-001 | 0.0007 | 0.000011 | ||
ops-gte-sentence-embedding-multilingual-base | 0.0007 | 0.000025 | ||
ops-qwen3-embedding-0.6b | 0.0007 | 0.000071 | ||
Used for text and image embedding | ops-m2-encoder Text embedding | USD/1,000 tokens | 0.0007 | 0.000026 |
ops-m2-encoder Image embedding | USD/image | 0.0007 | 0.0000162 | |
ops-m2-encoder-large Text embedding | USD/1,000 tokens | 0.0007 | 0.000067 | |
ops-m2-encoder-large Image embedding | USD/image | 0.0007 | 0.000033 | |
ops-gme-qwen2-vl-2b-instruct Text embedding | USD/1,000 tokens | 0.0007 | 0.00008 | |
ops-gme-qwen2-vl-2b-instruct Image vectorization | USD/image | 0.0007 | 0.000072 | |
ops-text-sparse-embedding-001 | USD/1,000 tokens | 0.00084 | 0.00014 | |
ops-embedding-dim-reduction-001 | USD/doc | 0.0007 | 0.0000071 | |
ops-bge-reranker-larger | USD/doc | 0.00014 | 0.000013 | |
ops-text-reranker-001 | 0.00014 | 0.000062 | ||
ops-qwen3-reranker-0.6b | 0.00014 | 0.000015 | ||
ops-video-snapshot-001 | USD per 1,000 units | 0.016 | ||
ops-audio-asr-001 | USD/hour | 0.09 | ||
Search engine | Alibaba Cloud Elasticsearch: a fully managed cloud service that is built based on open source Elasticsearch. It is 100% compatible with open source features and supports out-of-the-box use and pay-as-you-go billing. For more information, see Alibaba Cloud ES. | |||
OpenSearch-Vector Search Edition: For more information about billing, see Vector Search Edition. | ||||
qwen3-235b-a22b | USD/1,000 tokens | Input: 0.00056 Output: 0.0056 | ||
qwq-32b | Input: 0.00028 Output: 0.00084 | |||
ops-qwen-turbo | Input: 0.00006 Output: 0.0001 | |||
qwen-turbo | Input: 0.000042 Output: 0.000084 | |||
qwen-plus | Input: 0.000112 Output: 0.00028 | |||
qwen-max | Input: 0.000336 Output: 0.001344 | |||
deepseek-r1 | Input: 0.00056 Output: 0.00224 | |||
deepseek-r1-distill-qwen-7b | Input: 0.00007 Output: 0.00014 | |||
deepseek-r1-distill-qwen-14b | Input: 0.00014 Output: 0.00042 | |||
deepseek-v3 | Input: 0.00028 Output: 0.00112 | |||
| USD/call | 0.00735 | ||
USD/1,000 tokens | Query rewrite: Input: 0.000336 Output: 0.001344 | |||
ops-query-analyze-001 | USD/1,000 tokens | Intention recognition and alternate query extension:
| ||
Natural language to SQL (NL2SQL) generation:
| ||||
The assessment module is used to comprehensively evaluate the retrieval-augmented generation (RAG) development process provided by the Open Platform for AI Search, from the user asking a question to the RAG system retrieving content and the LLM generating an answer. | USD/1,000 tokens | Input: 0.0007 Output: 0.0021 | ||
Model customization
Model name | Description | Price |
Custom training for the vector dimension reduction model lets you customize a vector dimension reduction model based on the vector data you provide. In actual business scenarios, you first use an embedding model to vectorize text or queries, and then use the vector dimension reduction model to further reduce the vector dimensions. | You are charged based on the number of CUs of computing resources consumed. The price of each CU is 0.5422614 USD. The number of CUs consumed depends on the amount and dimension of the training data. For example, to train a model with 100,000 pieces of 1024-dimensional data, about 250 CUs are consumed, and the fee is 250 × 0.5422614 = 135.56535 USD. |
Model deployment
Billing formula: CU price × CUs per instance type × Number of instances
The following table describes the billing rules.
Instance type | CU price (USD/hour) | CUs per machine | Price per machine (USD/hour) |
gpu.v100.16g.x1 | 0.15 | 30.14 | 4.521 |
gpu.t4.16g.x1 | 16.07 | 2.4105 | |
gpu.a10.24g.x1 | 11.01 | 1.6515 |
For example, if you purchase two gpu.a10.24g.x1 instances to deploy a model service, the fee is calculated as follows: 0.15 × 11.01 × 2 = 3.303 USD/hour.
Service development
After you start an instance, you are charged on a pay-as-you-go basis. The fee is calculated using the following formula: CU price × CUs per instance type × Number of instances.
The following table describes the billing rules.
Instance type | CU price (USD/hour) | CUs per machine | Price per machine (USD/hour) |
gpu.t4.16g.x1 | 0.15 | 16.07 | 2.4105 |
ops.basic1.gi.large | 0.61 | 0.0915 |
For example, if you select one ops.basic1.gi.large instance, the fee is calculated as follows: 0.15 × 0.61 × 1 = 0.0915 USD/hour.
Precautions
Service names correspond one-to-one with service IDs and API parameters. For more information, see Service overview.
Bills are generated hourly based on your actual usage.
For services with tiered pricing, charges are calculated based on the usage in each tier.
For services that use 1,000 tokens as the billing unit, usage statistics may include decimals.
Make sure that your Alibaba Cloud account has no overdue payments to avoid service disruptions.
Billing example
Assume that in the Germany (Frankfurt) region, you call the document chunking service for 1,000 units. The fees are calculated as follows:
Fee for the first 500 units (inclusive): 0.0009 USD × 500 = 0.45 USD
Fee for the units over 500: 0.000003 USD × 500 = 0.0015 USD
Total fee: 0.45 + 0.0015 = 0.4515 USD