All Products
Search
Document Center

Platform For AI:Built-in environment variables

Last Updated:Apr 01, 2026

When you submit a training job in Deep Learning Containers (DLC) of Platform for AI (PAI), DLC automatically injects environment variables that you can use in the code.

PyTorch environment variables

In a distributed PyTorch training job, all nodes must communicate through the master node. DLC injects the following variables so each node can discover the master's address and understand its position in the cluster.

VariableDescriptionExample
MASTER_ADDRService address of the master nodedlc18isgeayd****-master-0
MASTER_PORTPort of the master node23456
WORLD_SIZETotal number of nodes in the job2 (1 master + 1 worker)
RANKIndex of this node across the entire job0 for master; 1, 2 for worker-0, worker-1 (1 master + 2 workers)
NPROC_PER_NODENumber of GPUs for each worker node8 for a GU7E node with 8 GPUs

TensorFlow environment variables

Distributed TensorFlow training uses TF_CONFIG to describe the full cluster topology and identify the current task. DLC sets this variable on every node automatically.

VariableDescription
TF_CONFIGJSON string describing the distributed network topology, including the cluster worker list and the task identity of the current node

Example value (for worker-0 in a two-worker job):

{
  "cluster": {
    "worker": [
      "dlc1y3madghd****-worker-0.t1612285282502324.svc:2222",
      "dlc1y3madghd****-worker-1.t1612285282502324.svc:2222"
    ]
  },
  "task": {
    "type": "worker",
    "index": 0
  },
  "environment": "cloud"
}

The cluster.worker array lists all workers in the job. The task object identifies this node: type is its role and index is its zero-based position in the worker list.

Lingjun high-performance network variables

For environment variables used with Lingjun AI Computing Service (Lingjun), see the "Configure high-performance network variables" section in RDMA: high-performance networks for distributed training.