This topic describes the differences between the pay-per-byte and pay-per-CU billing methods used by Data Lake Analytics (DLA). Only the serverless Presto engine of DLA supports the pay-per-byte billing method.

After you activate DLA, you can choose the pay-per-byte or pay-per-CU billing method based on the actual scenarios. The serverless Presto engine of DLA supports both the pay-per-byte and pay-per-CU billing methods. The serverless Spark engine of DLA supports only the pay-per-CU billing method.

  • Pay-per-byte

    By default, DLA uses the pay-per-byte billing method. This billing method is suitable for scenarios in which data is not frequently queried and the amount of data queried is small. If no data queries occur, you are not charged. For more information, see Pay-per-byte.

  • Pay-per-CU

    This billing method is suitable for scenarios in which data is frequently queried and the amount of data queried is large. This billing method also helps you use DLA with accurate budgeting.

    If you use the pay-per-CU billing method, you can specify the MIN and MAX parameters. The MIN parameter specifies the long-term required quota. This quota can be charged on a subscription or pay-as-you-go basis. If the quota is exceeded, the excess amount of CUs is charged on a pay-as-you-go basis. The MAX parameter specifies the maximum number of CUs that you can purchase. To achieve cost-effectiveness, you can set MIN to a small value and MAX to a value within an appropriate range. This way, you only pay for the quota specified by MIN and the CUs that you scale out during peak hours. For more information, see Pay-per-CU (for DLA CU Edition only).

The following table describes the differences between the pay-per-byte and per-per-CU billing methods used by the serverless Presto engine of DLA.
Item Pay-per-byte Pay-per-CU
Billing basis You are charged based on the number of bytes that are scanned. Less queries incur less fees. Pay-per-byte is suitable for scenarios in which data is not frequently queried. You are charged based on the number of CUs that you purchase. Fees are not related to the number of bytes that are scanned. Pay-per-CU is suitable for scenarios in which data is frequently queried.
Supported data sources Data sources provided by Alibaba Cloud: Object Storage Service (OSS), Tablestore (OTS), AnalyticDB for MySQL, ApsaraDB RDS for SQL Server, PolarDB for PostgreSQL, ApsaraDB for MongoDB, and ApsaraDB for Redis. Open source data sources: MySQL, SQL Server, PostgreSQL, MongoDB, and Redis. Data sources supported by the serverless Presto engine that uses pay-per-byte. Data sources provided by Alibaba Cloud: Elasticsearch, MaxCompute, and ApsaraDB for Cassandra. Open source data sources: self-managed HDFS, Oracle, Kudu, Druid, Elasticsearch, and Cassandra.
Number of SQL statements that can be concurrently executed 10. 100.
Maximum SQL runtime 30 minutes. 12 hours.
Support for access to a self-managed Hive metastore No. Yes.
Support for a built-in cache No. Yes.
Support for partition mapping No. Yes.
Support for UDFs No. Supported in the future.