This topic describes how to deploy Apache Spark in a cluster by creating a stack in the Resource Orchestration Service (ROS) console.
Background information
Apache Spark is a general-purpose computing engine designed for large-scale data processing. Apache Spark uses Scala as an application framework and leverages Resilient Distributed Datasets (RDDs) for in-memory computing. Apache Spark provides interactive queries and can optimize workloads by using iterative algorithms.
You can use the Spark Cluster Edition (Existing VPC) sample template to create multiple Elastic Compute Service (ECS) instances based on existing resources, such as a virtual private cloud (VPC), vSwitch, and security group. Among the ECS instances, one is associated with an elastic IP address (EIP) to serve as the management node and the others are managed by using Auto Scaling. The following software versions are used in the sample template:
JDK 1.8.0: the Java Development Kit (JDK)
Hadoop 2.7.7: the framework for distributed systems
Scala 2.12.1: the programming language
Apache Spark 2.1.0: the computing engine
After a stack is created by using the sample template, you can obtain the value of SparkWebSiteURL and log on to the Apache Spark management console. If you want to access the URL specified by SparkWebSiteURL over the Internet, you must configure an inbound rule for the security group to allow traffic on port 8080. For more information, see Add a security group rule.
Step 1: Create a stack
Log on to the ROS console.
In the left-side navigation pane, choose Templates > Public Templates.
Search for the Spark Cluster Edition (Existing VPC) sample template.
Click Create Stack.
In the Configure Parameters step, configure the Stack Name parameter and the following parameters.
Parameter
Description
Example
Existing VPC Instance ID
The ID of the VPC.
For more information about how to create and query a VPC, see Create and manage a VPC.
vpc-bp1m6fww66xbntjyc****
VSwitch Zone ID
The zone ID of the vSwitch that resides in the VPC.
Hangzhou Zone K
VSwitch ID
The ID of the vSwitch that resides in the VPC.
For more information about how to create and query a vSwitch, see Create and manage a vSwitch.
vsw-bp183p93qs667muql****
Business Security Group ID
The ID of the ECS security group.
For more information about how to query the ID of a security group, see Search for security groups.
sg-bp15ed6xe1yxeycg7o****
Instance Type
The instance type of the ECS instances.
Select a valid instance type. For more information, see Overview of instance families.
ecs.c5.large
Instance Password
The password of the ECS instances.
Test_12****
Disk Type
The disk category. Valid values:
cloud_efficiency: ultra disk.
cloud_ssd: standard SSD.
For more information, see Disks.
cloud_efficiency
System Disk Space
The system disk size of an ECS instance.
Valid values: 20 to 500.
Unit: GB.
40
Instance Amount
The number of ECS instances in the cluster in which you want to deploy Apache Spark.
Valid values: 3 to 10.
3
Click Next:Check and Confirm. Then, click Create.
On the Stack Information tab, view the stack status. Wait until the stack is created. Then, click the Outputs tab to obtain the value of SparkWebSiteURL.
Access the URL specified by SparkWebSiteURL and log on to the Apache Spark management console.
Step 2: View resources
In the left-side navigation pane, choose Deployment > Stacks.
On the Stacks page, click the ID of the desired stack.
Click the Resources tab to view information about the resources in the stack.
The following table describes the resources in this example.
Resource
Quantity
Resource description
Specification description
ALIYUN::ECS::Instance
1
Creates an ECS instance to deploy the Apache Spark primary service.
An ECS instance that has the following specifications is created:
Instance type: ecs.c5.large.
Disk category: ultra disk.
System disk size: 40 GB.
Public IP address: A public IP address is allocated.
ALIYUN::ESS::ScalingGroup
2
Creates two scaling groups to deploy Apache Spark secondary services.
Scaling groups automatically scale elastic computing resources based on the scaling rules that you configure to meet your business requirements.
Two ECS instances are created. Each ECS instance has the following specifications:
Instance type: ecs.c5.large.
Disk category: ultra disk.
System disk size: 40 GB.
Public IP address: A public IP address is allocated.
ALIYUN::RAM::Role
1
Creates a Resource Access Management (RAM) role to issue a Security Token Service (STS) token that is valid within a short period of time. This way, you can grant access permissions in a secure manner.
None.
ALIYUN::VPC::EIP
1
Creates an EIP to associate the EIP with an ECS instance. This way, the ECS instance can be accessed over the Internet.
None.
ALIYUN::OOS::Template
2
Creates two CloudOps Orchestration Service (OOS) templates to create lifecycle hooks.
For more information, see Lifecycle hooks.
None.
NoteFor more information about the pricing details of resources, go to the relevant console or refer to the pricing documentation of each resource.