Pay-per-byte is a pay-as-you-go billing method. When you use this billing method, you are charged only for the number of bytes that are scanned. If you use Data Lake Analytics (DLA) to perform association analysis on data of local or third-party data sources, you are charged based on the number of bytes that are scanned. This topic describes the billing rules, billing examples, and preferential policies of the pay-per-byte billing method.
- Every TB of data that is scanned costs USD 4.27.
- Every GB of data that is scanned costs USD 0.0043.
The minimum size of scanned data for which you are charged is 32 MB. DLA generates a bill on an hourly basis and fees are deducted from the balance of your Alibaba Cloud account. To view bills, you can log on to the DLA console and choose .
Methods to reduce costs
- Format conversion: convert the format of the raw data into a high-performance data
DLA supports multiple high-performance data formats, such as Apache ORC, Apache Parquet, and Apache Avro. You can convert the format of your data into one of the preceding formats. Then, use DLA to scan only data in the required columns.
- Data compression: compress the raw data to reduce the data size. We recommend that you compress the data into a file in the Apache Parquet or Apache ORC format. Then, use DLA to scan data in the file.
- Data partitioning: store the raw data in different partitions. Then, use DLA to scan the data in one or more partitions.
You store a CSV file and a JSON file in Object Storage Service (OSS) and store a table
in an ApsaraDB RDS database. The sizes of the files and table are all 1 TB. You want
to perform association analysis on data in OSS and table data in the ApsaraDB RDS
database. The total size of the data that needs to be scanned is 3 TB. Each TB of
data that is scanned costs USD 4.27. In this case, the fee you need to pay is USD
12.81 (4.27 + 4.27 + 4.27).
- Compress the 1 TB CSV file into a GZIP file. The size of the GZIP file is 0.4 TB. Store the data of the GZIP file in different partitions and store the data that you want to scan in the same partition. This way, DLA scans data only in this partition. The size of the data that is scanned is reduced to 0.2 TB.
- Convert the 1 TB JSON file into the Apache ORC format. This way, DLA scans only 10% of the data by column. The size of the data that is scanned is reduced to 0.1 TB.
After you convert, compress, and partition data, the total fee that you need to pay
5.56 (4.27 × 0.2 + 4.27 × 0.1 + 4.27). The cost is reduced by USD 7.25.