All Products
Search
Document Center

AnalyticDB:Managed Ray service

Last Updated:Mar 28, 2026

AnalyticDB Ray is a fully managed Ray service built into AnalyticDB for MySQL. It enhances open source Ray with production-grade kernel optimizations and simplified cluster operations, so you can run large-scale distributed AI workloads — multimodal processing, search recommendations, financial risk control, and embodied intelligence — directly on your data lakehouse without managing infrastructure.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster

What AnalyticDB Ray adds over open source Ray

Open source Ray is a distributed computing framework for AI and high-performance computing. It lets you scale single-node Python code to thousand-node clusters with minimal changes, and its built-in modules — Ray Tune, Ray Train, and Ray Serve — integrate seamlessly with TensorFlow and PyTorch.

Running open source Ray in production, however, introduces challenges: distributed job optimization, fine-grained resource scheduling, complex cluster operations, and ensuring system stability at scale.

AnalyticDB Ray addresses these challenges with the following enhancements:

EnhancementDescription
Kernel performance optimizationImproved scheduling and execution efficiency for large-scale distributed workloads.
Simplified cluster operationsManaged head node and worker node lifecycle, so you focus on jobs rather than infrastructure.
Disaster recoveryA Redis-based mechanism that automatically recovers Ray clusters, actors, and tasks when the head node restarts.
Automatic scalingDynamic worker node scaling based on logical resource requirements, with multi-worker-group support to prevent overloading or underutilizing individual groups.
Data lakehouse integrationNative integration with AnalyticDB for MySQL, enabling integrated Data + AI architectures.

Billing

When you create a Ray cluster resource group, you are charged for the following:

  • Storage consumed by the Worker Disk Storage setting.

  • If Worker Resource Type is set to CPU: AnalyticDB Compute Unit (ACU) elastic resources actually used.

  • If Worker Resource Type is set to GPU: the GPU specifications and quantity provisioned.

Usage notes

Worker node changes

Modify worker configurations during off-peak hours to minimize impact. Deleting or restarting worker nodes has these effects:

  • Drivers, actors, and tasks running on the affected nodes fail. Ray automatically redeploys actors and tasks.

  • Data in Ray's distributed object storage is lost. Tasks that depend on data from the restarted node also fail.

Resource group changes

OperationImpact
Delete a resource groupRunning tasks in the resource group are interrupted.
Delete a worker groupThe worker group's nodes are deleted. See the impact of worker node deletion above.
Set Maximum Workers after the change to a value less than Minimum Workers before the changeWorker nodes are deleted. See the impact of worker node deletion above.
Change any parameter other than Minimum Workers or Maximum Workers (for example, head resource specifications or worker resource types)Head node or worker nodes restart. See the impact of worker node restart above.

Automatic scaling

Ray clusters scale based on logical resource requirements, not physical resource utilization. Scaling may trigger even when physical utilization is low.

Some third-party applications create as many tasks as possible to maximize resource usage. When automatic scaling is enabled, this can cause Ray clusters to scale up rapidly to the maximum size. Understand the task-creation logic of any third-party program before enabling automatic scaling to avoid unexpected resource consumption.

Create a Ray cluster resource group

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find and click your cluster ID.

  2. In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab. In the upper-right corner, click Create Resource Group.

  3. In the Create Resource Group panel, enter a resource group name, set Job Type to AI, and configure the following parameters.

    ParameterDescription
    Deployment ModeSelect RayCluster.
    Head Resource SpecificationsThe head node manages Ray metadata, runs the Global Control Store (GCS) service, and schedules tasks — it does not execute tasks. Specifications define the number of CPU cores: small, m.xlarge, or m.2xlarge. Choose based on the overall scale of your Ray cluster. The number of CPU cores matches Spark resource specifications.
    Worker Group NameA name for this worker group. One AI resource group can have multiple worker groups with different names.
    Worker Resource TypeCPU: for daily computing tasks, multitasking, and complex logical operations. GPU: for large-scale data parallel processing, machine learning, and deep learning training.
    Worker Resource SpecificationsFor CPU: small, m.xlarge, or m.2xlarge. CPU core counts match Spark resource specifications. For GPU: submit a ticket for available specifications, as these depend on GPU models and inventory.
    Worker Disk StorageDisk used for Ray logs, temporary data, and overflow from distributed object storage. Unit: GB. Range: 30–2000. Default: 100. Disks are for temporary storage only — do not use them for long-term data.
    Minimum WorkersMinimum worker nodes in this group (minimum value: 1).
    Maximum WorkersMaximum worker nodes in this group (maximum value: 8). When minimum and maximum differ, AnalyticDB for MySQL scales the group dynamically based on task demand. With multiple worker groups, the system distributes load to prevent over- or under-utilization.
    Distribution UnitGPUs allocated to each worker node (for example, 1/3). Required only when Worker Resource Type is GPU.
  4. Click OK.

Connect to your Ray cluster and submit jobs

Step 1: Get the cluster URLs

  1. In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab.

  2. Find your AI resource group and choose More > Details in the Actions column. The following URLs are available:

    URLDescription
    Ray Cluster EndpointInternal URL for connecting to the cluster.
    Ray DashboardPublic URL for the Ray visualization page. Click it to view the statuses of your cluster, worker groups, and jobs.
    Ray GrafanaURL for the Grafana metrics dashboard.

    image

Step 2: Choose a job submission method

Two methods are available. Use CTL for most scenarios.

MethodHow it worksWhen to use
CTL (cloud task launcher) (Recommended)Packages your script files and uploads them to the Ray cluster for execution. The entry program runs in the cluster and consumes cluster resources.Default choice for most workloads. No local Ray version management required.
ray.initConnects a locally running entry program to the remote Ray cluster. The local program does not consume cluster resources.When you need the entry program to run locally. Local Ray and Python versions must match the cluster version; update the local environment if the cluster version changes.

Submit jobs with CTL

Prerequisites: Python 3.7 or later.

  1. Install Ray with the default extras:

    pip3 install ray[default]
  2. (Optional) Set the cluster URL as an environment variable so you don't need to specify it on each submission:

    export RAY_ADDRESS="<ray-dashboard-url>"

    Replace <ray-dashboard-url> with the Ray Dashboard URL from Step 1 (port 8265).

  3. Submit a job. Replace the following placeholders: Example with address specified:

    • If you set RAY_ADDRESS in the previous step: ``bash ray job submit --working-dir <your-working-directory> -- python <your-script.py> ``

    • If you did not set RAY_ADDRESS: ``bash ray job submit --address <ray-dashboard-url> --working-dir <your-working-directory> -- python <your-script.py> ``

    Important

    CTL packages and uploads all files in the directory specified by --working-dir to the Ray head node. Keep this directory minimal — large files can cause upload failures. Store all dependency script files in this directory; missing dependencies cause execution failures.

    PlaceholderDescriptionExample
    <ray-dashboard-url>The Ray Dashboard URL from Step 1http://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265
    <your-working-directory>Path to the directory containing your script and dependencies/root/Ray
    <your-script.py>Your Python entry scriptscripts.py
    ray job submit --address http://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265 --working-dir /root/Ray -- python scripts.py
  4. Check the job status using either of these methods:

    • Run the following command: ``bash ray job list ``

    • On the Resource Groups tab, find your AI resource group, choose More > Details in the Actions column, and click the Ray Dashboard URL.

Submit jobs with ray.init

Prerequisites: Python 3.7 or later.

  1. Install Ray:

    pip3 install ray
  2. Prepare the Ray URL for ray.init. The Ray Dashboard URL from Step 1 uses port 8265 and the http:// protocol. For ray.init, change the port to 10001 and the protocol to ray://.

    URL typeFormat
    Ray Dashboard URLhttp://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265
    ray.init URLray://amv-uf64gwe14****-rayo.ads.aliyuncs.com:10001
  3. Run the program.

    • If you set RAY_ADDRESS to the ray.init URL format: ``bash export RAY_ADDRESS="ray://<cluster-hostname>:10001" python scripts.py ``

    • If you specify the URL in the script: ``python ray.init(address="ray://<cluster-hostname>:10001") ` Then run: `bash python scripts.py ``

    Important

    If the Ray URL is incorrect, ray.init starts a local Ray cluster instead of connecting to the remote cluster. Check the output logs to confirm the connection target.

What's next