Last Updated: Apr 26, 2019


For Data Lake Analytics (DLA), you only pay for the data that is scanned by each query. There are no upfront infrastructure costs or maintenance costs.

Pay $4 for every TB of data scanned, and you are billed on an hourly basis.

You can scan up to 100GB for free, within a 30-day window since the DLA service activation date. You need to pay at the regular price after the first 100GB, or after 30 days.

For example:If you want to perform association analysis on a CSV file (1 TB) and JSON file (1 TB) stored in OSS, and on another table (1 TB) stored in RDS, the cost to perform this operation is as follows:

$12 = $4/TB x 1 TB (CSV) + $4/TB x 1 TB (JSON) + $4/TB x 1 TB (RDS)

How to save more

You can save more and optimize performance by compressing raw data, converting data formats, or partitioning data.

Compressing: This allows DLA to scan less data, thereby reducing overall costs.

Converting data format: DLA supports Apache ORC, Apache Parquet, and Avro. Based on your business needs, you can use filters to partially scan target files, tables, or objects.

Partitioning: You can partition data to limit the amount of data DLA scans, and avoid incurring costs from full scans.


Assume you compress a CSV file to gzip format, which minimizes the file size to 0.4 TB. You can then partition this gzip file, and scan 50% of it or the equivalent to 0.2 TB. Additionally, you can convert JSON data to ORC, and then scan only 10% of the entire file or 0.1 TB in total.

The cost to perform the preceding operation is as follows:

$5.2 = $4/TB x 0.2 TB (partitioned gzip) + $4/TB x 0.1 TB (ORC) + $4/TB x 1 TB (RDS).