All Products
Search
Document Center

Overview

Last Updated: May 21, 2020

Serverless Spark is a data analytics and computing service that is developed based on the cloud-native architecture and targets the Data Lake Analytics scenario. After you activate Data Lake Analytics, you can submit Spark jobs as required. During the execution of a Spark job, Data Lake Analytics dynamically allocates computing resources based on workloads. You are charged for the resources that are consumed by a Spark job. Serverless Spark helps you eliminate the workload for resource planning and configurations.

Architecture

Architecture

Terms

  • Virtual cluster

    Serverless Spark uses the multi-tenant mode. The Spark process runs in an isolated security environment, and a virtual cluster is a secure unit for resource isolation.

    Different from physical clusters, virtual clusters do not have fixed computing resources. You only need to configure resource quotas and the network environment where the data that you want to access is located as required without the need to configure and maintain computing nodes. You can also configure cluster-level parameters for Spark jobs, which facilitates unified Spark job management.

  • Compute unit

    Compute unit (CU) is the measurement unit of Serverless Spark. One CU equals one vCPU and 4 GB memory. Bills are generated based on the CUs that are used by a Spark job and the duration for running the Spark job.

    The Serverless Spark feature is in the beta phase. You are welcome to learn and experience the feature with a free trial.

  • Resource specification

    The underlying layer of Serverless Spark uses Alibaba Cloud Elastic Container Instance. To simplify user configurations, Data Lake Analytics provides three specifications, which include small, medium, and large, instead of detailed specifications of Elastic Container Instance. Data Lake Analytics preferentially uses high-performance resources if you do not specify the resource specifications.

    Specification Computing resource CUs
    small 1 Core 4 GB 1
    medium 2 Core 8 GB 2
    large 4 Core 16 GB 4

Limits

The Serverless Spark feature of Data Lake Analytics has the following limits:

  • The Serverless Spark feature is available only to China (Zhangjiakou-Beijing Winter Olympics).

  • You can use Serverless Spark to access only OSS data.

  • The Serverless Spark feature only supports three compute unit specifications: small, medium, and large.

  • A maximum of 10 virtual clusters can be created under an Alibaba Cloud account.

How to use Serverless Spark

  1. Create a virtual cluster

  2. Create and execute a Spark job