AnalyticDB Ray is a fully managed Ray service built into AnalyticDB for MySQL. It enhances open source Ray with production-grade kernel optimizations and simplified cluster operations, so you can run large-scale distributed AI workloads — multimodal processing, search recommendations, financial risk control, and embodied intelligence — directly on your data lakehouse without managing infrastructure.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster
What AnalyticDB Ray adds over open source Ray
Open source Ray is a distributed computing framework for AI and high-performance computing. It lets you scale single-node Python code to thousand-node clusters with minimal changes, and its built-in modules — Ray Tune, Ray Train, and Ray Serve — integrate seamlessly with TensorFlow and PyTorch.
Running open source Ray in production, however, introduces challenges: distributed job optimization, fine-grained resource scheduling, complex cluster operations, and ensuring system stability at scale.
AnalyticDB Ray addresses these challenges with the following enhancements:
| Enhancement | Description |
|---|---|
| Kernel performance optimization | Improved scheduling and execution efficiency for large-scale distributed workloads. |
| Simplified cluster operations | Managed head node and worker node lifecycle, so you focus on jobs rather than infrastructure. |
| Disaster recovery | A Redis-based mechanism that automatically recovers Ray clusters, actors, and tasks when the head node restarts. |
| Automatic scaling | Dynamic worker node scaling based on logical resource requirements, with multi-worker-group support to prevent overloading or underutilizing individual groups. |
| Data lakehouse integration | Native integration with AnalyticDB for MySQL, enabling integrated Data + AI architectures. |
Billing
When you create a Ray cluster resource group, you are charged for the following:
Storage consumed by the Worker Disk Storage setting.
If Worker Resource Type is set to CPU: AnalyticDB Compute Unit (ACU) elastic resources actually used.
If Worker Resource Type is set to GPU: the GPU specifications and quantity provisioned.
Usage notes
Worker node changes
Modify worker configurations during off-peak hours to minimize impact. Deleting or restarting worker nodes has these effects:
Drivers, actors, and tasks running on the affected nodes fail. Ray automatically redeploys actors and tasks.
Data in Ray's distributed object storage is lost. Tasks that depend on data from the restarted node also fail.
Resource group changes
| Operation | Impact |
|---|---|
| Delete a resource group | Running tasks in the resource group are interrupted. |
| Delete a worker group | The worker group's nodes are deleted. See the impact of worker node deletion above. |
| Set Maximum Workers after the change to a value less than Minimum Workers before the change | Worker nodes are deleted. See the impact of worker node deletion above. |
| Change any parameter other than Minimum Workers or Maximum Workers (for example, head resource specifications or worker resource types) | Head node or worker nodes restart. See the impact of worker node restart above. |
Automatic scaling
Ray clusters scale based on logical resource requirements, not physical resource utilization. Scaling may trigger even when physical utilization is low.
Some third-party applications create as many tasks as possible to maximize resource usage. When automatic scaling is enabled, this can cause Ray clusters to scale up rapidly to the maximum size. Understand the task-creation logic of any third-party program before enabling automatic scaling to avoid unexpected resource consumption.
Create a Ray cluster resource group
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find and click your cluster ID.
In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab. In the upper-right corner, click Create Resource Group.
In the Create Resource Group panel, enter a resource group name, set Job Type to AI, and configure the following parameters.
Parameter Description Deployment Mode Select RayCluster. Head Resource Specifications The head node manages Ray metadata, runs the Global Control Store (GCS) service, and schedules tasks — it does not execute tasks. Specifications define the number of CPU cores: small, m.xlarge, or m.2xlarge. Choose based on the overall scale of your Ray cluster. The number of CPU cores matches Spark resource specifications. Worker Group Name A name for this worker group. One AI resource group can have multiple worker groups with different names. Worker Resource Type CPU: for daily computing tasks, multitasking, and complex logical operations. GPU: for large-scale data parallel processing, machine learning, and deep learning training. Worker Resource Specifications For CPU: small, m.xlarge, or m.2xlarge. CPU core counts match Spark resource specifications. For GPU: submit a ticket for available specifications, as these depend on GPU models and inventory. Worker Disk Storage Disk used for Ray logs, temporary data, and overflow from distributed object storage. Unit: GB. Range: 30–2000. Default: 100. Disks are for temporary storage only — do not use them for long-term data. Minimum Workers Minimum worker nodes in this group (minimum value: 1). Maximum Workers Maximum worker nodes in this group (maximum value: 8). When minimum and maximum differ, AnalyticDB for MySQL scales the group dynamically based on task demand. With multiple worker groups, the system distributes load to prevent over- or under-utilization. Distribution Unit GPUs allocated to each worker node (for example, 1/3). Required only when Worker Resource Type is GPU. Click OK.
Connect to your Ray cluster and submit jobs
Step 1: Get the cluster URLs
In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab.
Find your AI resource group and choose More > Details in the Actions column. The following URLs are available:
URL Description Ray Cluster Endpoint Internal URL for connecting to the cluster. Ray Dashboard Public URL for the Ray visualization page. Click it to view the statuses of your cluster, worker groups, and jobs. Ray Grafana URL for the Grafana metrics dashboard. 
Step 2: Choose a job submission method
Two methods are available. Use CTL for most scenarios.
| Method | How it works | When to use |
|---|---|---|
| CTL (cloud task launcher) (Recommended) | Packages your script files and uploads them to the Ray cluster for execution. The entry program runs in the cluster and consumes cluster resources. | Default choice for most workloads. No local Ray version management required. |
| ray.init | Connects a locally running entry program to the remote Ray cluster. The local program does not consume cluster resources. | When you need the entry program to run locally. Local Ray and Python versions must match the cluster version; update the local environment if the cluster version changes. |
Submit jobs with CTL
Prerequisites: Python 3.7 or later.
Install Ray with the default extras:
pip3 install ray[default](Optional) Set the cluster URL as an environment variable so you don't need to specify it on each submission:
export RAY_ADDRESS="<ray-dashboard-url>"Replace
<ray-dashboard-url>with the Ray Dashboard URL from Step 1 (port 8265).Submit a job. Replace the following placeholders: Example with address specified:
If you set
RAY_ADDRESSin the previous step: ``bash ray job submit --working-dir <your-working-directory> -- python <your-script.py>``If you did not set
RAY_ADDRESS: ``bash ray job submit --address <ray-dashboard-url> --working-dir <your-working-directory> -- python <your-script.py>``
ImportantCTL packages and uploads all files in the directory specified by
--working-dirto the Ray head node. Keep this directory minimal — large files can cause upload failures. Store all dependency script files in this directory; missing dependencies cause execution failures.Placeholder Description Example <ray-dashboard-url>The Ray Dashboard URL from Step 1 http://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265<your-working-directory>Path to the directory containing your script and dependencies /root/Ray<your-script.py>Your Python entry script scripts.pyray job submit --address http://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265 --working-dir /root/Ray -- python scripts.pyCheck the job status using either of these methods:
Run the following command: ``
bash ray job list``On the Resource Groups tab, find your AI resource group, choose More > Details in the Actions column, and click the Ray Dashboard URL.
Submit jobs with ray.init
Prerequisites: Python 3.7 or later.
Install Ray:
pip3 install rayPrepare the Ray URL for ray.init. The Ray Dashboard URL from Step 1 uses port 8265 and the
http://protocol. For ray.init, change the port to 10001 and the protocol toray://.URL type Format Ray Dashboard URL http://amv-uf64gwe14****-rayo.ads.aliyuncs.com:8265ray.init URL ray://amv-uf64gwe14****-rayo.ads.aliyuncs.com:10001Run the program.
If you set
RAY_ADDRESSto the ray.init URL format: ``bash export RAY_ADDRESS="ray://<cluster-hostname>:10001" python scripts.py``If you specify the URL in the script: ``
python ray.init(address="ray://<cluster-hostname>:10001")`Then run:`bash python scripts.py``
ImportantIf the Ray URL is incorrect, ray.init starts a local Ray cluster instead of connecting to the remote cluster. Check the output logs to confirm the connection target.