All Products
Search
Document Center

Container Service for Kubernetes:Submit a job in a Ray cluster

Last Updated:Feb 29, 2024

Ray is intended for scaling AI and Python applications and is widely used in the machine learning field. You can create a Ray cluster in a Container Service for Kubernetes (ACK) cluster and submit jobs in the Ray cluster to execute distributed tasks. Ray clusters are suitable for scenarios such as model training, data processing, and model evaluation. Submit a Ray job in a local Ray cluster.

Prerequisites

A Ray cluster is created based on ACK.

A Ray cluster offers various methods for running jobs. For more information, see how do you use the ray-client and quick start useing the ray job cli.

  1. Run the following command to query the pod information of the Ray cluster:

    kubectl get pod -n ${RAY_CLUSTER_NS}

    Expected output:

    NAME                                           READY   STATUS    RESTARTS   AGE
    myfirst-ray-cluster-head-v7pbw                 2/2     Running   0          39m
  2. Run the following command to connect to the Bash shell inside the pod from the local terminal:

    Replace the value with the actual pod name.

    kubectl exec -it -n ${RAY_CLUSTER_NS} myfirst-ray-cluster-head-v7pbw -- bash
  3. Run the echo or cat command in the head pod to save the my_script.py file.

    import ray
    import os
    
    # Connect to a local or remote Ray cluster
    ray.init()
    
    @ray.remote(num_cpus=1)
    class Counter:
        def __init__(self):
            self.name = "test_counter"
            self.counter = 0
    
        def increment(self):
            self.counter += 1
    
        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)
    
    counter = Counter.remote()
    
    for _ in range(10000):
        counter.increment.remote()
        print(ray.get(counter.get_counter.remote()))
    
  4. Run the my_script.py script to execute the distributed task.

    python my_script.py
    # Expected output:
    2024-01-24 04:25:27,286	INFO worker.py:1329 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
    2024-01-24 04:25:27,286	INFO worker.py:1458 -- Connecting to existing Ray cluster at address: 172.16.0.236:6379...
    2024-01-24 04:25:27,295	INFO worker.py:1633 -- Connected to Ray cluster. View the dashboard at http://172.16.0.236:8265
    test_counter got 0
    test_counter got 1
    test_counter got 2
    test_counter got 3
    
    ...

What to do next