Pay-per-byte is a pay-as-you-go billing method. When you use this billing method, you are charged only for the number of bytes that are scanned. If you use Data Lake Analytics (DLA) to perform association analysis on data of local or third-party data sources, you are charged based on the number of bytes that are scanned. This topic describes the billing rules, billing examples, and preferential policies of the pay-per-byte billing method.

Billing rules

When you use the pay-per-byte billing method, the following rules apply:
  • Every TB of data that is scanned costs USD 4.27.
  • Every GB of data that is scanned costs USD 0.0043.

The minimum size of scanned data for which you are charged is 32 MB. DLA generates a bill on an hourly basis and fees are deducted from the balance of your Alibaba Cloud account. To view bills, you can log on to the DLA console and choose Expenses > Orders.

Methods to reduce costs

To reduce costs, you can use one of the following methods to process the raw data before you use DLA to scan the data:
  • Format conversion: convert the format of the raw data into a high-performance data format.

    DLA supports multiple high-performance data formats, such as Apache ORC, Apache Parquet, and Apache Avro. You can convert the format of your data into one of the preceding formats. Then, use DLA to scan only data in the required columns.

  • Data compression: compress the raw data to reduce the data size. We recommend that you compress the data into a file in the Apache Parquet or Apache ORC format. Then, use DLA to scan data in the file.
  • Data partitioning: store the raw data in different partitions. Then, use DLA to scan the data in one or more partitions.

Billing examples

You store a CSV file and a JSON file in Object Storage Service (OSS) and store a table in an ApsaraDB RDS database. The sizes of the files and table are all 1 TB. You want to perform association analysis on data in OSS and table data in the ApsaraDB RDS database. The total size of the data that needs to be scanned is 3 TB. Each TB of data that is scanned costs USD 4.27. In this case, the fee you need to pay is USD 12.81 (4.27 + 4.27 + 4.27).

For the preceding billing example, you can perform the following operations to reduce the costs:
  • Compress the 1 TB CSV file into a GZIP file. The size of the GZIP file is 0.4 TB. Store the data of the GZIP file in different partitions and store the data that you want to scan in the same partition. This way, DLA scans data only in this partition. The size of the data that is scanned is reduced to 0.2 TB.
  • Convert the 1 TB JSON file into the Apache ORC format. This way, DLA scans only 10% of the data by column. The size of the data that is scanned is reduced to 0.1 TB.

After you convert, compress, and partition data, the total fee that you need to pay is USD 5.56 (4.27 × 0.2 + 4.27 × 0.1 + 4.27). The cost is reduced by USD 7.25.