This topic describes how to use Arena in multi-tenant scenarios. Five tasks are used as examples in this topic.
Prerequisites
- A Container Service for Kubernetes (ACK) cluster is created. For more information, see Create an ACK managed cluster.
- An Elastic Compute Service (ECS) instance that runs Linux is created in the virtual private cloud (VPC) where the ACK cluster is deployed. For more information, see Create an instance by using the wizard.
In this example, the ECS instance serves as a client. The client is used as an Arena workstation to submit jobs to the ACK cluster.
- The latest version of the Arena client is installed. For more information, see Install Arena.
Background information
In some scenarios, multiple developers in a company or team want to use Arena to submit jobs. To improve the efficiency of job management, you can divide the developers into several user groups and grant each user group different permissions. This allows you to allocate and isolate resources, and manage permissions by user group.
Hostname | Role | IP address | Number of GPUs | Number of vCores | MEM |
---|---|---|---|---|---|
client01 | Client | 10.0.0.97 (private) 39.98.xxx.xxx (public) | 0 | 2 | 8 GiB |
master01 | Master | 10.0.0.91 (private) | 0 | 4 | 8 GiB |
master02 | Master | 10.0.0.92 (private) | 0 | 4 | 8 GiB |
master03 | Master | 10.0.0.93 (private) | 0 | 4 | 8 GiB |
worker01 | Worker | 10.0.0.94 (private) | 1 | 4 | 30 GiB |
worker02 | Worker | 10.0.0.95 (private) | 1 | 4 | 30 GiB |
worker03 | Worker | 10.0.0.96 (private) | 1 | 4 | 30 GiB |
Tasks
- Task 1: Create two user groups named dev1 and dev2 for the ACK cluster, and add users Bob and Tom to User Groups dev1 and dev2 separately.
- Task 2: Allow Bob and Tom to log on to the client only with their own accounts. Both of them must have a separate environment where they can run Arena.
- Task 3: Grant Bob and Tom the permissions to view and manage only their own jobs.
- Task 4: Allocate the GPU, CPU, and memory resources of worker nodes to different user groups. Arena jobs can consume the computing resources of only worker nodes.
- Task 5: Create volumes that can be shared only within a user group and create volumes that can be shared across all user groups.
User group | User | GPU | CPU | MEM | Shared volume |
---|---|---|---|---|---|
dev1 | bob | 1 | Unlimited | Unlimited | dev1-public and department1-public-dev1 |
dev2 | tom | 2 | 8 | 60 GiB | dev2-public and department1-public-dev2 |
Step 1: Create and manage users and user groups for the ACK cluster
For security purposes, we recommend that you do not install Arena, run Arena, or manage the ACK cluster on a master node. You can create an ECS instance in the VPC where the ACK cluster is deployed and install Arena on the ECS instance. In this example, the ECS instance functions as a client. You can create a kubeconfig file to enable the ECS instance to access the ACK cluster.
- Create users and user groups on the client.
- Create service accounts and namespaces for the ACK cluster.
After you submit jobs, the jobs are running in the ACK cluster. Each user on the client corresponds to a service account in the ACK cluster and each user group corresponds to a namespace. Therefore, you must create service accounts and namespaces, and make sure that they are mapped to the corresponding users and user groups on the client. You must map namespaces to user groups, and map service accounts to users.
Log on to the client as the root user. Make sure that the root user has permissions to manage the ACK cluster. For more information, see Use kubectl to connect to an ACK cluster. Run the following commands:# Create a namespace for User Group dev1. kubectl create namespace dev1 # Create a namespace for User Group dev2. kubectl create namespace dev2 # Create a service account for Bob. kubectl create serviceaccount bob -n dev1 # Create a service account for Tom. kubectl create serviceaccount tom -n dev2 # Query the created service accounts and namespaces. kubectl get serviceaccount -n dev1 kubectl get serviceaccount -n dev2
Expected output:
Step 2: Configure Arena for the users
- Install Arena. Install Arena on the client. Log on to the client as the root user and download the latest installation package of Arena. Then, decompress the package and run the install.sh script. For more information, see Install Arena.Note You need to install Arena only once on each Linux client. To allow each user to use Arena in a separate environment, you must create a kubeconfig file for each user.
- Create a kubeconfig file for each user. To allow each user to use Arena in a separate environment, you must create a kubeconfig file for each user (service account). You can grant users different permissions on the ACK cluster. This ensures data security.Log on to the client as the root user, and run the createKubeConfig.sh script to create a kubeconfig file for each user. For more information, see Appendix. Run the following commands:
# Grant execute permissions on the createKubeConfig.sh script. chmod +x createKubeConfig.sh # Create a kubeconfig file for Bob in User Group dev1. ./createKubeConfig.sh bob -n dev1 # Create a kubeconfig file for Tom in User Group dev2. ./createKubeConfig.sh tom -n dev2 # Place the kubeconfig file of each user in their home directory. mkdir -p /home/bob/.kube/ && cp bob.config /home/bob/.kube/config mkdir -p /home/tom/.kube/ && cp tom.config /home/tom/.kube/config
Expected output:
Step 3: Grant the users different permissions on Arena
- Create required roles in the namespaces of the ACK cluster.
You can create roles that define different permissions in the namespaces of the ACK cluster. Each role in a namespace contains a set of permission rules. For more information about how to create a role, see Using RBAC Authorization.
- Grant the users permissions on the ACK cluster.
After you create roles, you must bind the roles to the users to grant them permissions. To complete this task, bind the roles in the namespaces to the service accounts.
You can bind one or more roles in different namespaces to each user. This allows a user to access resources that belong to different namespaces. Role-based access control (RBAC) allows you to manage permissions that the users have on namespaces. You can create role bindings based on business requirements.
You can use two Kubernetes objects, RoleBinding and ClusterRoleBinding, to create role bindings that grant users permissions on namespaces or the ACK cluster. For more information about how to use RoleBinding and ClusterRoleBinding to describe role bindings, see Using RBAC Authorization.
To complete Task 3, bind the roles that are created in Step 3 to Bob and Tom. To grant permissions to users, perform the following operations:
Step 4: Configure resource quotas for the user groups
ACK provides a console where you can manage all resources of an ACK cluster. To ensure the security and efficiency of resource usage, you can set resource quotas for the user groups. You can submit jobs to a namespace on which you have permissions. When you submit a job, ACK automatically checks the available resources of the namespace. If the amount of resources requested by a job exceeds the quota, ACK rejects the job.
In Kubernetes, a ResourceQuota object describes constraints that limit aggregate resource consumption per namespace. Each namespace corresponds to a user group on the client where Arena is installed. You can set quotas for various resources, such as CPU, memory, and extended resources. Extended resources include NVIDIA GPUs. A ResourceQuota object also limits the resource usage of containers and other Kubernetes objects in a namespace. For more information, see Resource quotas.
To complete Task 4, set quotas of GPU, CPU, and memory resources for each user group. For more information, see Allocate resources. In this example, allocate one GPU to dev1. The CPU and memory resources are unlimited. Bob in dev1 can use all CPU and memory resources of the cluster. Then, allocate the following resources to dev2: 2 GPUs, 8 vCores, and 60 GiB of memory. To configure resource quotas for the user groups, perform the following operations:
kubectl apply -f dev1_quota.yaml
kubectl apply -f dev2_quota.yaml
# Query the resource quota in dev1.
kubectl get resourcequotas -n dev1
# Query the resource quota in dev2.
kubectl get resourcequotas -n dev2
# Query the resource usage in dev1.
kubectl describe resourcequotas dev1-compute-resources -n dev1
# Query the resource usage in dev2.
kubectl describe resourcequotas dev2-compute-resources -n dev2
Expected output:
In this case, computing resources are allocated to the user groups, and Task 4 is completed.
Step 5: Create NAS volumes to enforce multi-level access control
To meet the requirements of multi-level access control, you must create volumes that are accessible to different users and user groups. This ensures the security of data sharing.
To complete Task 5, you must create two types of shared volumes. The first type of volume is used to store data that can be accessed by users in both user groups. The other type of volume is used to store data that can be accessed by only users in a specific user group. For more information about shared volumes, see Allocate resources. In this example, four volumes are created: dev1-public, dev2-public, department1-public-dev1, and department1-public-dev2. department1-public-dev1 and department1-public-dev2 are mounted to the same directory of the NAS file system. The volume data can be accessed by users in both dev1 and dev2. dev1-public and dev2-public are mounted to different directories of the NAS file system. Data stored on dev1-public can be accessed by only Bob in dev1, and data stored on dev2-public can be accessed by only Tom in dev2. To create NAS volumes for data sharing, perform the following operations:
- Create a NAS file system. Log on to the Apsara File Storage NAS console .
- Create persistent volumes (PVs) and persistent volume claims (PVCs) for the ACK cluster.
- Verify the configurations of the volumes. Log on to the client as the root user, and run the following commands to query the volumes that are used by User Groups dev1 and dev2:
# Query the volumes that are used by User Group dev1. arena data list -n dev1 # Query the volumes that are used by User Group dev2. arena data list -n dev2
Expected output:
After you complete the preceding operations, you have completed all the five tasks. The following example describes how to log on to the client where Arena is installed with the accounts of Bob and Tom.
Step 6: Run Arena with different user accounts
Use the account of Bob
- Run the following commands to log on to the client and query available shared volumes:
# Log on to the client with the account of Bob. ssh bob@39.98.xxx.xx # Run the arena data list command to query shared volumes that are available to Bob. arena data list
Expected output:
- Run the following command to submit a training job that requires one GPU:
arena submit tf \ --name=tf-git-bob-01 \ --gpus=1 \ --image=tensorflow/tensorflow:1.5.0-devel-gpu \ --sync-mode=git \ --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \ "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
- Run the following command to list all the jobs that are submitted by Bob:
arena list
Expected output:
- Run the following command to submit another training job that requires one GPU:
arena submit tf \ --name=tf-git-bob-02 \ --gpus=1 \ --image=tensorflow/tensorflow:1.5.0-devel-gpu \ --sync-mode=git \ --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \ "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
In this example, only one GPU is allocated to User Group dev1. Therefore, ACK is expected to reject the second job.
The preceding figure shows that the ACK cluster still has sufficient resources. However, the GPU allocated to the user group to which Bob belongs is already occupied by the first job. As a result, the second job is suspended.
Use the account of Tom
- Run the following commands to log on to the client and query available shared volumes:
# Log on to the client with the account of Tom. ssh tom@39.98.xx.xx # Run the arena data list command to query shared volumes that are available to Bob. arena data list
Expected output:
- Run the following command to list all the jobs that are submitted by Bob:
arena list
Tom cannot view the jobs that are submitted by Bob. - Run the following command to submit a training job that requires one GPU:
arena submit tf \ --name=tf-git-tom-01 \ --gpus=1 \ --chief-cpu=2 \ --chief-memory=10Gi \ --image=tensorflow/tensorflow:1.5.0-devel-gpu \ --sync-mode=git \ --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \ "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
Note In this example, GPU, CPU, and memory resources are allocated to User Group dev2. Therefore, Tom must specify the resources requested by the job to be submitted. - Run the following command to submit another job that requires one GPU:
arena submit tf \ --name=tf-git-tom-02 \ --gpus=1 \ --chief-cpu=2 \ --chief-memory=10Gi \ --image=tensorflow/tensorflow:1.5.0-devel-gpu \ --sync-mode=git \ --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \ "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
- Run the following command to list all the jobs that are submitted by Bob:
arena list
Expected output:
Result
The preceding results indicate that you can log on to the client and run Arena in a separate environment with the accounts of Bob and Tom. You can also query and use compute and storage resources that are allocated to each user group, and manage jobs that are submitted by Bob and Tom.
Appendix
wget https://aliacs-k8s-cn-hongkong.oss-cn-hongkong.aliyuncs.com/arena/arena-multi-users-demo.tar.gz