Billing overview
Alibaba Cloud OpenLake is an integrated solution for multimodal data and large language model (LLM) scenarios. OpenLake itself does not incur extra charges. OpenLake builds a unified platform for data lakehouses, search, and AI by combining multiple mature Alibaba Cloud products, such as DLF, DataWorks, PAI, EMR, Hologres, StarRocks, MaxCompute, OpenSearch, and Milvus.
The total cost of OpenLake is the sum of the fees for the actual usage of each underlying product. You pay only for the computing, storage, network, and service resources that you use. There are no platform surcharges or integration premiums.
Billing principles
Pay for what you use
All components use a pay-as-you-go or subscription billing method. The fees are transparent.
There are no minimum charges, mandatory bundles, or hidden costs.
No OpenLake platform fees
OpenLake serves as an architectural solution and capability integration layer. It does not generate separate billable items.
Even when you use enhanced features such as OpenLake Studio, Copilot, or Agent, the underlying resources are provided by products such as DataWorks and PAI. The costs are included in the bills for those products.
Cost optimization
Mechanisms such as unified storage, serverless elasticity, and intelligent data tiering significantly reduce the total cost of ownership (TCO).
Multiple engines share the same data. This avoids redundant storage and extract, transform, and load (ETL) costs.
Billing for major components
Each product offers a free quota. New users can also use trial resources.
Component | Purpose | Primary billing dimensions | Official billing documentation |
DLF (Data Lake Formation) | Unified metadata catalog, permission management, data lineage, and registration for multi-format tables (Paimon, Iceberg, and Lance) | Storage usage and number of metadata operations (Catalog API calls) | |
DataWorks | Data development, task scheduling, quality monitoring, security governance, and OpenLake Studio (Notebook and IDE) | Software fees and schedule resources | |
EMR Spark (Serverless) | Batch processing, ETL, feature engineering, and AI data pre-processing | Compute unit (CU) hours | |
EMR StarRocks (Serverless) | High-concurrency interactive queries, BI analysis, and ad-hoc exploration | CU hours and storage capacity (GB-month) | |
Flink (real-time computing) | Stream ETL, real-time lakehouse ingestion (Fluss and Paimon), and stateful computing | CU hours | |
Hologres | Real-time data warehousing, millisecond-level writes, high-concurrency serving, and unified stream and batch analytics | CU hours and storage capacity (GB-month) | |
MaxCompute | Large-scale offline data warehousing, lakehouse computing, and T+1 batch processing | Computing (SQL and MapReduce CU-seconds) and storage capacity (GB-month) | |
PAI (Platform for AI) | LLM training (DLC) and inference services (EAS) | GPU and CPU hours | |
Milvus (vector engine) | Vector similarity search, multimodal retrieval, and retrieval-augmented generation (RAG) knowledge bases | Compute node hours (CPU and GPU) and storage capacity (GB-month) | |
OpenSearch | Full-text search, hybrid search (keyword and vector), and preview of structured and unstructured data | Instance types (CPU and memory), storage capacity (GB-month), and QPS or number of requests |
Cost optimization suggestions
Reduce costs with unified storage
Use DLF as the single data foundation to avoid the redundant costs associated with multiple storage systems, such as HDFS, S3, and NAS.
Serverless Elasticity
Choose fully managed services such as EMR Serverless Spark, Flink, and Hologres to scale resources on demand. This eliminates waste from idle resources.
Intelligent Lifecycle Management
Use DLF lifecycle rules to automatically transition cold data to Infrequent Access or Archive Storage. This can save more than 50% on storage fees.
Use reserved instances or resource plans
For stable workloads, such as daily scheduled tasks, you can purchase computing or storage resource plans to receive discounts of 30% to 60%.