This topic describes the pricing of Data Lake Analytics (DLA), including billing methods, billing examples, and preferential policies.
Billing based on the volume of scanned data
If you use DLA to perform association analysis on data in local or third-party data sources, you are charged based on the volume of scanned data.
Billing rules- Every TB of data that is scanned costs USD 4.
- Every GB of data that is scanned costs USD 0.004.
The minimum scanned data volume that the billing system bills is 32 MB. The billing system generates one bill per hour and deducts fees from your Alibaba Cloud account. To view the bills, log on to the DLA console, choose
in the top navigation bar, and then click Bills in the left-side navigation pane. Billing exampleYou store a CSV file of 1 TB and a JSON file of 1 TB in Object Storage Service (OSS), and you store a data table of 1 TB in an ApsaraDB RDS database.
You want to perform association analysis on data in OSS and the ApsaraDB RDS database.
The total volume of the data that needs to be scanned is 3 TB, with the cost of scanning
each TB of data being USD 4. In this case, the fee you need to pay is USD 12
.
To reduce costs, you can use the following methods to process the raw data before you use DLA to scan the data:
- Format conversion: You can convert the format of the raw data into a high-performance
data format.
DLA supports multiple high-performance data formats, such as Apache ORC, Apache Parquet, and Apache Avro. You can convert the format of your data into one of the preceding formats based on your business requirements. Then, you can use DLA to scan only the required columns.
- Data compression: You can compress the data to reduce the data size. We recommend that you compress the data into the Apache Parquet or Apache ORC format. Then, you can use DLA to scan the compressed data.
- Data partitioning: You can store the raw data in different partitions. Then, you can use DLA to scan the data in only some of the partitions.
For the preceding billing example, you can perform the following operations to reduce the data scan costs:
- Compress the 1 TB CSV file into a GZIP file. The size of the compressed file is 0.4 TB. Store the data of the GZIP file in different partitions and store the data you want to scan in the same partition. Then, DLA scans data only in this partition. This way, the volume of scanned data is reduced to 0.2 TB.
- Convert the 1 TB JSON file into the Apache ORC format. Then, DLA scans only 10% of the data by column. The volume of scanned data is reduced to 0.1 TB.
After format conversion, compression, and partitioning of the data, the total fee
you need to pay is USD 5.2
. The cost is reduced by USD 6.8.
Billing based on the number of CUs
- Basic specifications of a CU: One CU provides 1 vCPU and 4 GB of memory.
- Minimum CU Specifications: the CU specifications that you can keep and use for a long time. You are charged based on CU specifications.
- Maximum CU Specifications: the maximum CU specifications that you can use. Some elastic
resources are provided to meet your potential requirements for higher CU specifications.
You are charged for these elastic resources by using the pay-as-you-go method. You
can specify this parameter based on your business requirements.
Note If the required CU specifications exceed the value of Maximum CU Specifications, submit a ticket to apply for higher specifications.
The type of billing methods are supported for billing based on the number of CUs: pay-as-you-go.
Pay-as-you-go
This billing method allows you to pay after use. The following description explains the billing method:
- The DLA serverless SQL engine is used for interactive analysis and does not require elastic resources. In this case, the value of the Maximum CU Specifications parameter is the same as that of the Minimum CU Specifications parameter. The minimum values of both parameters are 8 Cores, 32 GB. In this case, you are charged based on the CU amount specified by Minimum CU Specifications.
- The DLA serverless Spark engine can provide job-level elasticity. Therefore, no long-term resources need to be held for the establishment of virtual clusters (VCs), and the value of Minimum CU Specifications can be set to 0 Cores, 0 GB. In this case, you are charged based on the number of CUs that you used.
- DLA serverless Spark engine: If you have used four CUs to compute resources for three
minutes, the fee you need to pay is USD
0.01
based on the billing rule for the DLA serverless Spark engine. - DLA serverless SQL engine: If you have used eight CUs to compute resources for three
minutes, the fee you need to pay is USD
0.02
based on the billing rule for the DLA serverless SQL engine.
Others
The metadata crawling feature helps you establish a metadata system that centers on OSS storage (DLA storage). This feature reduces the costs of metadata construction and is free of charge.