For Data Lake Analytics (DLA), you only pay for the data that is scanned by each query. There are no upfront infrastructure costs or maintenance costs.
Pay $4 for every TB of data scanned, and you are billed on an hourly basis.
You can scan up to 100GB for free, within a 30-day window since the DLA service activation date. You need to pay at the regular price after the first 100GB, or after 30 days.
For example:If you want to perform association analysis on a CSV file (1 TB) and JSON file (1 TB) stored in OSS, and on another table (1 TB) stored in RDS, the cost to perform this operation is as follows:
$12 = $4/TB x 1 TB (CSV) + $4/TB x 1 TB (JSON) + $4/TB x 1 TB (RDS)
You can save more and optimize performance by compressing raw data, converting data formats, or partitioning data.
Compressing: This allows DLA to scan less data, thereby reducing overall costs.
Converting data format: DLA supports Apache ORC, Apache Parquet, and Avro. Based on your business needs, you can use filters to partially scan target files, tables, or objects.
Partitioning: You can partition data to limit the amount of data DLA scans, and avoid incurring costs from full scans.
Assume you compress a CSV file to gzip format, which minimizes the file size to 0.4 TB. You can then partition this gzip file, and scan 50% of it or the equivalent to 0.2 TB. Additionally, you can convert JSON data to ORC, and then scan only 10% of the entire file or 0.1 TB in total.
The cost to perform the preceding operation is as follows:
$5.2 = $4/TB x 0.2 TB (partitioned gzip) + $4/TB x 0.1 TB (ORC) + $4/TB x 1 TB (RDS).