This topic describes how to deploy Apache Spark on a single Elastic Compute Service (ECS) instance by creating a stack in the Resource Orchestration Service (ROS) console.
Background information
Apache Spark is a general-purpose computing engine designed for large-scale data processing. Apache Spark uses Scala as its application framework and uses resilient distributed datasets (RDDs) for in-memory computing. It provides interactive queries and can optimize workloads by means of iterative algorithms.
The Deploy Spark on a Single Instance in an Existing VPC sample template creates an ECS instance based on existing resources such as a virtual private cloud (VPC), a vSwitch, and a security group and associates an elastic IP address (EIP) with the instance. The following software versions are used in the sample template:
- JDK: 1.8.0
- Hadoop: 2.7.7
- Scala: 2.12.1
- Kafka: 2.1.0
After a stack is created by using the sample template, you can obtain the URL of the Spark web interface and log on to the Spark management console. If you want to access the URL of the Spark web interface over the Internet, add inbound rules to the security group to allow traffic on ports 8088 and 8080. For more information, see Add security group rules.