This topic uses five objectives to describe how to use Arena in multi-user scenarios.

Prerequisites

Make sure that the following operations are performed:
  • A Container Service for Kubernetes (ACK) cluster is created. For more information, see Create a cluster of ACK Managed Edition.
  • A Linux-based Elastic Compute Service (ECS) instance is deployed in the virtual private cloud (VPC) where the ACK cluster resides. For more information, see Create an instance by using the wizard.

    In this example, the ECS instance is referred to as a client. After you install Arena on the client, you can run Arena commands on the client to submit jobs to the ACK cluster.

  • Arena of the latest version is installed. For more information, see Install the latest version of Arena.

Background information

Assume that multiple users in a company or a team use Arena to submit jobs. For efficient management, you can divide users into several groups and grant each group user different permissions. This allows you to allocate and isolate resources, and manage permissions based on user groups.

You can allocate resources to each group based on your needs, such as GPU, CPU, and memory resources. You can also grant each group user different permissions, and provide an independent environment for each user to run Arena commands. You can grant group users the following types of permissions: the permissions to view and manage jobs, and the permissions to read and write specific data.
Figure 1. Configure Arena for multi-user scenarios
arena
The following table lists the client where Arena is installed and the nodes in the ACK cluster.
Node Role IP address Number of GPUs Number of CPU cores MEM
client01 Client 10.0.0.97 (private) 39.98.xxx.xxx (public) 0 2 8 GiB
master01 Master 10.0.0.91 (private) 0 4 8 GiB
master02 Master 10.0.0.92 (private) 0 4 8 GiB
master03 Master 10.0.0.93 (private) 0 4 8 GiB
worker01 Worker 10.0.0.94 (private) 1 4 30 GiB
worker02 Worker 10.0.0.95 (private) 1 4 30 GiB
worker03 Worker 10.0.0.96 (private) 1 4 30 GiB
Note Unless otherwise specified, an administrator performs the following operations on the client.

Objectives

To configure Arena for multi-user scenarios, complete the following five objectives:
  • Objective 1: Create two groups dev1 and dev2 for the ACK cluster, and add user bob to dev1 and user tom to dev2.
  • Objective 2: Log on to the client as users bob and tom, and use Arena in an independent environment.
  • Objective 3: Grant users bob and tom the minimum permissions, and allow them to view and manage only the jobs submitted by their own.
  • Objective 4: Allocate the GPU, CPU, and memory resources of worker nodes based on groups. Arena jobs can consume only the computing resources of worker nodes.
  • Objective 5: Create volumes that can be accessed only by users within a group and volumes that can be accessed by all group users.
Table 1. Allocate resources
Group User GPU CPU MEM Shared volume
dev1 bob 1 Unlimited Unlimited dev1-public and department1-public-dev1
dev2 tom 2 8 60 GiB dev2-public and department1-public-dev2
Note department1-public-dev1 and department1-public-dev2 are mounted to the same directory of the NAS file system. This indicates that the volume data can be accessed by users in dev1 and dev2. dev1-public and dev2-public are mounted to different directories of the NAS file system. This indicates that data on dev1-public can be accessed only by user bob in dev1, and data on dev2-public can be accessed only by user tom in dev2.

Step 1: Create and manage users and groups for the ACK cluster

For security purposes, we recommend that you do not log on to a master node to install or use Arena, or manage the ACK cluster. Instead, you can create an ECS instance in the VPC where the ACK cluster resides and install Arena on the instance. In this example, the ECS instance is referred to as a client. You can configure a kubeconfig file to enable the ECS instance to access the ACK cluster.

  1. Create users and groups on the client.
    1. Use kubectl to connect to the ACK cluster.
      Before you use kubectl, install kubectl and configure a kubeconfig file that allows you to manage the ACK cluster. For more information, see Use kubectl to connect to an ACK cluster.
      Note The version of kubectl must be 1.10 or later.
    2. Run the following commands to create users and groups on the client.
      In this example, create users bob and tom, and groups dev1 and dev2. The Linux-based client allows you to configure passwords for users bob and tom. Then, you can log on to the client and run Arena commands as user bob or tom. In this case, Objective 2 is completed.
      # Create Linux groups, users.
      groupadd -g 10001 dev1
      groupadd -g 10002 dev2
      adduser -u 20001 -s /bin/bash -G dev1 -m bob
      adduser -u 20002 -s /bin/bash -G dev2 -m tom
      
      # Configure the password for user bob.
      passwd bob 
      # Configure the password for user tom.
      passwd tom
  2. Create service accounts and namespaces for the ACK cluster.

    After you submit jobs, the jobs are running in the ACK cluster. Each user and each group on the client correspond to a service account and a namespace in the ACK cluster. Therefore, you must create service accounts and namespaces in the ACK cluster.

    Log on to the client as user root, and make sure that user root has permissions to manage the cluster. For more information, see Use kubectl to connect to an ACK cluster. You can perform the following operations:
    # Create a namespace that corresponds to dev1.
    kubectl create namespace dev1
    
    # Create a namespace that corresponds to dev2.
    kubectl create namespace dev2
    
    # Create a service account that corresponds to user bob.
    kubectl create serviceaccount bob -n dev1
    
    # Create a service account that corresponds to user tom.
    kubectl create serviceaccount tom -n dev2
    
    # Query the created service accounts and namespaces.
    kubectl get serviceaccount -n dev1
    kubectl get serviceaccount -n dev2
    namespace serviceacount

Step 2: Configure Arena for multi-user scenarios

  1. Install Arena.
    Install Arena on the client. Log on to the client as user root and download the latest installation package of Arena. Then, extract the package and run the install.sh script in the package. For more information, see Install the latest version of Arena.
    Note You do not need to install Arena multiple times on a client. You can configure a kubeconfig file for each user. This enables resource isolation and allows each user to use Arena in an independent environment.
  2. Create a kubeconfig file for each user.
    To allow each user to use Arena in an independent environment, create a kubeconfig file for each user. You can grant users different permissions on the ACK cluster. This ensures data security.
    Log on to the client as user root, and run the createKubeConfig.sh script to create a kubeconfig file for each user. For more information, see Appendix. You can run the following commands:
    # Grant user root the permissions to run the createKubeConfig.sh script.
    chmod +x createKubeConfig.sh
    
    # Create a kubeconfig file for user bob in dev1.
    ./createKubeConfig.sh bob -n dev1
    
    # Create a kubeconfig file for user tom in dev2.
    ./createKubeConfig.sh tom -n dev2
    
    # Place the kubeconfig file of each user to the home directory.
    mkdir -p /home/bob/.kube/ && cp bob.config /home/bob/.kube/config
    mkdir -p /home/tom/.kube/ && cp tom.config /home/tom/.kube/config
    config
    After you perform the preceding operations, you can log on to the client as user bob or tom. Then, you can run Arena commands in an independent environment. In this case, Objective 1 and Objective 2 are completed.

Step 3: Grant users different permissions to use Arena

  1. Create required roles in the namespaces of the ACK cluster.

    You can create roles that define different permissions in the namespaces of the ACK cluster. A role in a namespace contains rules that represent a set of permissions. For more information about how to create a role, see Use RBAC authorization.

    1. Create files that define permissions.

      Create roles in dev1 and dev2, and bind the roles to users bob and tom. In this example, grant users bob and tom the minimum permissions, and allow them to view and manage only the jobs submitted by their own.

      For more information about how to create a role in dev1, see dev1_roles.yaml in Appendix. For more information about how to create a role in dev2, see dev2_roles.yaml in Appendix.

    2. After you create dev1_roles.yaml and dev2_roles.yaml, run the following commands to deploy the files to the ACK cluster:
      kubectl apply -f dev1_roles.yaml
      
      kubectl apply -f dev2_roles.yaml
      ROLE
      Run the following commands to query roles in the namespaces:
      kubectl get role -n dev1
      kubectl get role -n dev2
      If the content in the following figure appears, the roles are created.role
  2. Grant permissions to users.

    After you create roles, bind the roles to users. In this case, the users can access resources within the namespaces.

    You can bind roles in one or more namespaces to a user. This allows a user to access resources in one or more namespaces. In this case, you can dynamically manage the permissions of users in different namespaces by binding roles.

    Kubernetes provides RoleBinding and ClusterRoleBinding. This allows you to create roles that define permissions within a namespace or all namespaces. For more information about how to use RoleBinding and ClusterRoleBinding, see Use RBAC authorization.

    To complete Objective 3, bind the roles created in Step 3 to users bob and tom. You can perform the following operations:

    1. Log on to the client as user root. Run the following commands to bind the roles to users bob and tom. For more information about bob_rolebindings.yaml and tom_rolebindings.yaml, see Appendix.
      kubectl apply -f bob_rolebindings.yaml
      kubectl apply -f tom_rolebindings.yaml
      rolebinding
    2. Run the following commands to query the role binding status in dev1 and dev2:
      kubectl get rolebinding -n dev1
      kubectl get rolebinding -n dev2
      If the content in the following figure appears, the roles are bound to users bob and tom.role created

      In this case, the first three objectives are completed.

Step 4: Configure resource quotas for groups

ACK provides a unified console that allows you to manage all resources of a cluster. To ensure the security and efficiency of resource usage, configure resource quotas for groups. You can submit jobs to a namespace on which you have permissions. When you submit a job, ACK automatically checks the available resources of the namespace. If the request resources of a job exceed the resource quota, ACK rejects the job.

In Kubernetes, a resource quota provides constraints that limit aggregate resource consumption per namespace. A namespace corresponds to a group on the client where Arena is installed. You can configure quotas for various resources, such as CPU, memory, and NVIDIA GPU resources. A resource quota can also limit the resource usage of containers and other Kubernetes objects in a namespace. For more information, see Resource quotas.

To complete Objective 4, configure quotas for GPU, CPU, and memory resources. For more information, see Allocate resources. In this example, allocate one GPU to dev1. The CPU and memory resources are unlimited. This indicates that user bob in dev1 can consume all CPU and memory resources of the cluster. Then, allocate the following resources to dev2: two GPUs, eight CPU cores, and 60 GiB memory. You can perform the following operations:

Log on to the client as user root, and run the following commands to configure resource quotas. For more information about dev1_quota.yaml and dev2_quota.yaml, see Appendix.
kubectl apply -f dev1_quota.yaml
kubectl apply -f dev2_quota.yaml
After you configure resource quotas, run the following commands to check whether the configurations take effect. You can view the resource quotas and the resource usage.
# Query the resource quota in dev1.
kubectl get resourcequotas -n dev1

# Query the resource quota in dev2.
kubectl get resourcequotas -n dev2

# Query the resource usage in dev1.
kubectl describe resourcequotas dev1-compute-resources -n dev1

# Query the resource usage in dev2.
kubectl describe resourcequotas dev2-compute-resources -n dev2
quota

In this case, computing resources are allocated based on groups and Objective 4 is completed.

Step 5: Create NAS volumes for data sharing

ACK enables volume access control. You can mount volumes to a NAS file system, and grant users different permissions on volume data. This ensures the security of data sharing.

To complete Objective 5, create two types of shared volumes. The first volume type is used to store data that can be accessed by all group users, and the other type is used to store data that can be accessed only by users within a group. For more information about the shared volumes, see Allocate resources. In this example, four volumes are created: dev1-public, dev2-public, department1-public-dev1, and department1-public-dev2. department1-public-dev2 and department1-public-dev2 are mounted to the same directory of the NAS file system. This indicates that the volume data can be accessed by users in dev1 and dev2. dev1-public and dev2-public are mounted to different directories of the NAS file system. This indicates that data on dev1-public can be accessed only by user bob in dev1, and data on dev2-public can be accessed only by user tom in dev2. You can perform the following operations:

  1. Create a NAS file system.
    Log on to the NAS File System console, create a NAS file system, and add a mount target. For more information, see Configure a shared NAS volume for training jobs.
  2. Create persistent volumes (PVs) and persistent volume claims (PVCs) for the ACK cluster.
    1. Create PVs.
      Create four PVs. For more information about how to create a PV, see . The data of department1 on department1-public-dev1 and department1-public-dev2 can be accessed by users in dev1 and dev2. Data on dev1-public can be accessed only by user bob in dev1, and data on dev2-public can be accessed only by user tom in dev2. Configure the PV parameters based on the following figure.PV
      Note Select the mount target that you add in Step 1.
    2. Create PVCs.
      Create PVCs for the newly created PVs. For more information about how to create a PVC, see Create a PVC.
      After PVCs are created, you can find that department1-public-dev1 and dev1-public exist in dev1, and department1-public-dev2 and dev2-public exist in dev2.
  3. Query the volumes.
    Log on to the client as user root, and run the following commands to query the volumes in dev1 and dev2:
    # Query the volumes in dev1.
    arena data list -n dev1
    
    # Query the volumes in dev2.
    arena data list -n dev2
    check
    In this case, all objectives are completed. The following example describes how to log on to the client where Arena is installed as users bob and tom.

Step 6: Use bob and tom to run Arena commands

User bob

  1. Run the following commands to log on to the client and query available shared volumes:
    # Log on to the client as user bob.
    ssh bob@39.98.xxx.xx
    
    # Run the following command to query shared volumes that are available for user bob.
    arena data list
    result
  2. Run the following commands to submit a training job that requires one GPU:
    arena submit tf \
            --name=tf-git-bob-01 \
            --gpus=1 \
            --image=tensorflow/tensorflow:1.5.0-devel-gpu \
            --sync-mode=git \
            --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
            "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
  3. Run the following command to list all the jobs submitted by user bob:
    arena list
    bob
  4. Run the following commands to submit another training job that requires one GPU:
    arena submit tf \
            --name=tf-git-bob-02 \
            --gpus=1 \
            --image=tensorflow/tensorflow:1.5.0-devel-gpu \
            --sync-mode=git \
            --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
            "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"

    In the preceding configuration, you allocate one GPU to dev1. However, this GPU is used by the first job. Therefore, ACK expects to reject the second job, regardless of whether the cluster resources are sufficient.

    resultreason

    The preceding figure shows that despite sufficient resources, the second job submitted by user bob is suspended.

User tom

  1. Run the following commands to log on to the client and query available shared volumes:
    # Log on to the client as user tom.
    ssh tom@39.98.xxx.xx
    
    # Run the following command to query shared volumes that are available for user tom.
    arena data list
    tom job
  2. Run the following command to list all the jobs submitted by user tom:
    arena list
    User tom cannot view the jobs submitted by user bob.list
  3. Run the following commands to submit a training job that requires one GPU:
     arena submit tf \
             --name=tf-git-tom-01 \
             --gpus=1 \
             --chief-cpu=2 \
             --chief-memory=10Gi \
             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
             --sync-mode=git \
             --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
    Note In the preceding configuration, you allocate the GPU, CPU, and memory resources to dev2. Therefore, user tom must specify request resources for the job to be submitted.
  4. Run the following commands to submit another training job that requires one GPU:
     arena submit tf \
             --name=tf-git-tom-02 \
             --gpus=1 \
             --chief-cpu=2 \
             --chief-memory=10Gi \
             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
             --sync-mode=git \
             --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
  5. Run the following command to list all the jobs submitted by user tom:
    arena list
    result

Results

You can perform the following operations: Log on to the client as user bob or tom, run Arena commands in an independent environment, use Arena to query and use storage and computing resources of a group, and manage jobs as user bob or tom.

Appendix

Run the following command to download tools.tar.gz:
wget https://lumo-package.oss-cn-beijing.aliyuncs.com/tools.tar.gz