This topic describes the pricing of Data Lake Analytics (DLA), including billing methods, billing examples, and preferential policies.

When you use DLA, you are charged based on the volume of scanned data or the number of compute units (CUs) you used.
Note Specific pricing is subject to the official prices of Alibaba Cloud.

Billing based on the volume of scanned data

If you use DLA to perform association analysis on data in local or third-party data sources, you are charged based on the volume of scanned data.

Billing rules
For billing based on the volume of scanned data, DLA uses the pay-as-you-go (postpaid) billing method. You are charged based on the number of bytes that are scanned. The cluster setup, maintenance, and upgrade are free of charge. The following billing rules apply:
  • Every TB of data that is scanned costs USD 4.
  • Every GB of data that is scanned costs USD 0.004.

The minimum scanned data volume that the billing system bills is 32 MB. The billing system generates one bill per hour and deducts fees from your Alibaba Cloud account. To view the bills, log on to the DLA console, choose Expenses > User Center in the top navigation bar, and then click Bills in the left-side navigation pane.

Billing example

You store a CSV file of 1 TB and a JSON file of 1 TB in Object Storage Service (OSS), and you store a data table of 1 TB in an ApsaraDB RDS database.

You want to perform association analysis on data in OSS and the ApsaraDB RDS database. The total volume of the data that needs to be scanned is 3 TB, with the cost of scanning each TB of data being USD 4. In this case, the fee you need to pay is USD 12.

Methods to reduce costs

To reduce costs, you can use the following methods to process the raw data before you use DLA to scan the data:

  • Format conversion: You can convert the format of the raw data into a high-performance data format.

    DLA supports multiple high-performance data formats, such as Apache ORC, Apache Parquet, and Apache Avro. You can convert the format of your data into one of the preceding formats based on your business requirements. Then, you can use DLA to scan only the required columns.

  • Data compression: You can compress the data to reduce the data size. We recommend that you compress the data into the Apache Parquet or Apache ORC format. Then, you can use DLA to scan the compressed data.
  • Data partitioning: You can store the raw data in different partitions. Then, you can use DLA to scan the data in only some of the partitions.

For the preceding billing example, you can perform the following operations to reduce the data scan costs:

  • Compress the 1 TB CSV file into a GZIP file. The size of the compressed file is 0.4 TB. Store the data of the GZIP file in different partitions and store the data you want to scan in the same partition. Then, DLA scans data only in this partition. This way, the volume of scanned data is reduced to 0.2 TB.
  • Convert the 1 TB JSON file into the Apache ORC format. Then, DLA scans only 10% of the data by column. The volume of scanned data is reduced to 0.1 TB.

After format conversion, compression, and partitioning of the data, the total fee you need to pay is USD 5.2. The cost is reduced by USD 6.8.

Billing based on the number of CUs

Before you use the DLA CU Edition, you must purchase CUs with the required specifications for resource computing.
  • Basic specifications of a CU: One CU provides 1 vCPU and 4 GB of memory.
  • Minimum CU Specifications: the CU specifications that you can keep and use for a long time. You are charged based on CU specifications.
  • Maximum CU Specifications: the maximum CU specifications that you can use. Some elastic resources are provided to meet your potential requirements for higher CU specifications. You are charged for these elastic resources by using the pay-as-you-go method. You can specify this parameter based on your business requirements.
    Note If the required CU specifications exceed the value of Maximum CU Specifications, submit a ticket to apply for higher specifications.
Billing rules

The type of billing methods are supported for billing based on the number of CUs: pay-as-you-go.

Pay-as-you-go

This billing method allows you to pay after use. The following description explains the billing method:

One CU costs USD 0.06 per hour. The fee is calculated by second.
  • The DLA serverless SQL engine is used for interactive analysis and does not require elastic resources. In this case, the value of the Maximum CU Specifications parameter is the same as that of the Minimum CU Specifications parameter. The minimum values of both parameters are 8 Cores, 32 GB. In this case, you are charged based on the CU amount specified by Minimum CU Specifications.
  • The DLA serverless Spark engine can provide job-level elasticity. Therefore, no long-term resources need to be held for the establishment of virtual clusters (VCs), and the value of Minimum CU Specifications can be set to 0 Cores, 0 GB. In this case, you are charged based on the number of CUs that you used.
Billing examples
  • DLA serverless Spark engine: If you have used four CUs to compute resources for three minutes, the fee you need to pay is USD 0.01 based on the billing rule for the DLA serverless Spark engine.
  • DLA serverless SQL engine: If you have used eight CUs to compute resources for three minutes, the fee you need to pay is USD 0.02 based on the billing rule for the DLA serverless SQL engine.

Others

The metadata crawling feature helps you establish a metadata system that centers on OSS storage (DLA storage). This feature reduces the costs of metadata construction and is free of charge.