The Data Science service is developed on top of E-MapReduce (EMR). Some artificial intelligence (AI) components provided by Alibaba Cloud Machine Learning Platform for Artificial Intelligence are integrated into Data Science. The AI components include Alink (machine learning algorithm platform), Faiss (vector calculation engine), and TensorFlow or PyTorch (deep learning framework).
- Supports distributed model training based on TensorFlow and PAI-TF and provides a built-in EasyRec algorithm package.
- Supports Alink-based stream processing, batch processing, and machine learning algorithms.
- Supports the JupyterHub and Zeppelin services.
- Supports Spark.
- Provides an AutoML package to support machine learning.
- Provides various tools, such as EASCMD, Redis, Hive CLI, and Faiss-Server.
- Users of open source big data systems
- Users of intelligent recommendation and risk management solutions powered by the AI technology of Alibaba Cloud
Create a cluster
- In the Software Settings step, select a zone, an EMR version, and optional services.
- Zone: Only some zones are supported. You can view the supported zones of each region on the buy page.
- EMR Version: The latest EMR version is displayed by default.
EMR V3.29.1 is used as an example in this topic.
- Optional:Optional Services: Select optional services, such as TensorFlow, based on your business requirements.
In the Hardware Settings step, specify a VPC, a vSwitch, and a security group. If you need to create a security group, you must go to the ECS console for creation.
You must enable port 8443 to allow access to web UIs of related components. For more information, see Configure security group rules.
In the Basic Settings step, add a Knox account. It is used to log on to the Knox service.
The added Knox account is a RAM user within your Alibaba Cloud account. For more information about how to add a Knox account, see Manage user accounts.
View cluster details
Enable port 8443
Enable port 8443 of the created cluster to access the web UIs of services, such as YARN and HDFS. For more information about how to enable the port, see Access the web UIs of open source components.
View log data
You can access the web UI of a service to view the service log data. Example:
- Access the web UI of YARN.
For more information, see Access the web UIs of open source components.
- Click History in the row where your application is located.
- Click Logs.
You can view log data.
Resize a disk
You can resize the data disks of a cluster if the disk space is insufficient. For more information, see Expand disks.
Log on to a worker node
- Log on to your cluster in SSH mode.
For more information, see Connect to the master node of an EMR cluster in SSH mode.
- Switch to the hadoop user.
- Obtain the IP address of the worker node.
cat /etc/hosts | grep workerInformation similar to the following output is returned:
192.168.**.** emr-worker-2.cluster-20**** emr-worker-2 emr-header-3.cluster-20**** emr-header-3 iZbp19nv7e19wx1ub0t**** 192.168.**.** emr-worker-1.cluster-20**** emr-worker-1 emr-header-2 emr-header-2.cluster-20**** iZbp19nv7e19wx1ub0t****Note
192.168.**.**indicates the IP address of the worker node.
- Log on to the worker node without a password.
ssh <yourWorkIp>Note yourWorkIp indicates the IP address that you obtained.
- Use the sudo command on the worker node to run commands as a root user.
sudo pip3.7 install xxxNote
xxxindicates the command that you need to run.
The EasyRec algorithm library includes mainstream algorithms, such as DeepFM, DIN, and MultiTower. An EMR Data Science cluster provides a built-in EasyRec algorithm library for you to use. For more information about EasyRec, see EasyRec.
Assign a public IP address to a worker node
- On the Instances page of the EMR console, click the ID of the worker node in the ECS ID/Instance Name column.
- On the Instances page of the ECS console, click Bind EIP.
- In the Bind EIP dialog box, select an existing elastic IP address (EIP) or click Create EIP to create an EIP.
For more information about how to create an EIP, see Apply for new EIPs.