Submit a job in a Ray cluster - Container Service for Kubernetes

Ray is intended for scaling AI and Python applications and is widely used in the machine learning field. You can create a Ray cluster in a Container Service for Kubernetes (ACK) cluster and submit jobs in the Ray cluster to execute distributed tasks. Ray clusters are suitable for scenarios such as model training, data processing, and model evaluation. Submit a Ray job in a local Ray cluster.

Prerequisites

A Ray cluster is created based on ACK.

A Ray cluster offers various methods for running jobs. For more information, see how do you use the ray-client and quick start useing the ray job cli.

Run the following command to query the pod information of the Ray cluster:

kubectl get pod -n ${RAY_CLUSTER_NS}

Expected output:

NAME                                           READY   STATUS    RESTARTS   AGE
myfirst-ray-cluster-head-v7pbw                 2/2     Running   0          39m

Run the following command to connect to the Bash shell inside the pod from the local terminal:
Replace the value with the actual pod name.
```
kubectl exec -it -n ${RAY_CLUSTER_NS} myfirst-ray-cluster-head-v7pbw -- bash
```

Run the echo or cat command in the head pod to save the my_script.py file.

import ray
import os

# Connect to a local or remote Ray cluster
ray.init()

@ray.remote(num_cpus=1)
class Counter:
    def __init__(self):
        self.name = "test_counter"
        self.counter = 0

    def increment(self):
        self.counter += 1

    def get_counter(self):
        return "{} got {}".format(self.name, self.counter)

counter = Counter.remote()

for _ in range(10000):
    counter.increment.remote()
    print(ray.get(counter.get_counter.remote()))

Run the my_script.py script to execute the distributed task.

python my_script.py
# Expected output:
2024-01-24 04:25:27,286	INFO worker.py:1329 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
2024-01-24 04:25:27,286	INFO worker.py:1458 -- Connecting to existing Ray cluster at address: 172.16.0.236:6379...
2024-01-24 04:25:27,295	INFO worker.py:1633 -- Connected to Ray cluster. View the dashboard at http://172.16.0.236:8265
test_counter got 0
test_counter got 1
test_counter got 2
test_counter got 3

...

What to do next

For more information about how to access Ray Dashboard from the local network, see Access Ray Dashboard from the local network.
For more information about how to use the Ray autoscaler to automatically scale Elastic Compute Service (ECS) nodes or virtual Elastic Container Instance nodes, see Elastic scaling based on the Ray autoscaler and ACK autoscaler and Elastic scaling of Elastic Container Instance nodes based on the Ray autoscaler.