Serverless Spark is a data analytics and computing service that is developed based on the cloud-native, serverless architecture. It is designed for Data Lake Analytics (DLA) scenarios. To submit a Spark job, you can perform simple configurations after you activate DLA. When a Spark job is running, computing resources are dynamically assigned based on task loads. After the Spark job is completed, you are charged based on the resources consumed by the job. Serverless Spark helps you eliminate the workload for resource planning and cluster configurations. It is more cost-effective than the traditional mode.

Terms

  • virtual cluster

    Serverless Spark uses the multi-tenant mode. The Spark process runs in an isolated and secure environment. A virtual cluster is a secure unit for resource isolation.

    A virtual cluster does not have fixed computing resources. Therefore, you only need to allocate the resource quota based on your business requirements and configure the network environment to which the destination data that you want to access belongs. You do not need to configure or maintain computing nodes. You can also configure parameters for Spark jobs of a virtual cluster. This facilitates unified management of Spark jobs.

  • compute unit
    Compute unit (CU) is the measurement unit of Serverless Spark. One CU equals one vCPU and 4 GB of memory. After a Spark job is completed, you are charged based on the CUs that are consumed by the Spark job and the duration for running the Spark job. The pay-as-you-go billing method is used. The computing hours are used as the unit for billing. If you use one CU for an hour, the resources consumed by a job is the sum of resources consumed by each computing unit. The fee required for computing resources of a Spark job is calculated by using the following formula:
    Fee required for computing resources of a Spark job = Total computing hours of the Spark job × Unit price per computing hour (USD 0.05)
    Use the following job configuration as an example:
    • Driver: medium (2 CUs), runtime of 50s
    • Executor: medium (2 CUs) × 5, runtime of 40s
    The total fee is calculated by using the following formula: (2 CUs (driver) × 50s + 5 × 2 CUs (executor) × 40s)/3600s × 0.05 ≈ USD 0.007.
  • resource specification

    To simplify user configurations, the Serverless Spark feature of DLA performs simplified encapsulation on CPUs and memory. This way, you only need to select resource specifications such as small, medium, and large in the DLA console.

    small 1 core, 4 GB of memory 1 CU
    medium 2 cores, 8 GB of memory 2 CUs
    large 4 cores, 16 GB of memory 4 CUs

Limits

The Serverless Spark feature of DLA has the following limits:

  • The Serverless Spark feature supports only three types of CU specifications: small, medium, and large.
  • A maximum of 10 virtual clusters can be created under an Alibaba Cloud account.

Use Serverless Spark

  1. Manage virtual clusters.
  2. t1916440.html#concept_2420165.