This topic describes how to use ncluster to deploy artificial intelligence (AI) training and inference scripts to the cloud for computing.

Configure environment variables

Obtain the AccessKey pair of your Alibaba Cloud account, default region, and default zone from your ECS instance, computer, or Alibaba Cloud Shell tool.

export ALIYUN_ACCESS_KEY_ID=L****      # Your actual aliyun access key id
export ALIYUN_ACCESS_KEY_SECRET=v****   # Your actual aliyun access key secret
export ALIYUN_DEFAULT_REGION=cn-hangzhou  # The actual region the resource you want to use
export ALIYUN_DEFAULT_ZONE=cn-hangzhou-i  # The actual zone of the region you want to use

Call ncluster

ncluster is a set of Python libraries. To use the interface, you must import ncluster to a Python script.
import ncluster

Create resources

Use the ncluster.make_job operation of ncluster to create resources required by the task or reuse existing resources.

job = ncluster.make_job(name=args.name,
                        run_name=f"{args.name}-{args.machines}",
                        num_tasks=args.machines,
                        image_name=IMAGE_NAME,
                        instance_type=INSTANCE_TYPE)
The following table describes parameters of ncluster.make_job.
Parameter Description Example
name The name of the job. 'perseus-bert'
run_name The name of runtime environment. In most cases, the name is based on the name of the job and the number of instances associated with the job. f"perseus-bert-1"
num_tasks The number of instances to create. 1

1 indicates that one instance is to be created. The name of the instance created in the preceding example is task0.perseus-bert, which corresponds to the task name perseus-bert.tasks[0].

image_name The image of the instance. Both public and custom images are supported. 'ubuntu_18_04_64_20G_alibase_20190624.vhd'
instance_type The instance type of the instance to create. 'ecs.gn6v-c10g1.20xlarge'

Run tasks

You can run the task in a job or task mode. A job is a group of tasks. The following diagram shows the relationship between a job and tasks:

job
Note A job and the tasks in the job support the same API operations. Operations used to manage the job can be used on the tasks. However, operations used to manage a task can be used only on the task.
Example:
  • Call an API operation for a job
    # Open the perseus-bert folder for all instances used when tasks in a job are being executed.
    job.run('cd perseus-bert') 
    
    # Upload the perseus-bert folder from the current directory to the /root directories of all instances used when tasks in a job are being executed.
    job.upload('perseus-bert')
  • Call an API operation for a task
    # Open the perseus-bert folder for the instance used when task0 is being executed.
    job.tasks[0].run('cd perseus-bert') 
    
    # Upload the perseus-bert folder from the current directory to the /root directory of the instance when task0 is being executed.
    job.tasks[0].upload('perseus-bert')