This topic describes how to run machine learning tasks with SQL in AnalyticDB for MySQL. You will deploy a Behavior Sequence Transformer (BST) model, covering data transformation, model creation, training, evaluation, and prediction.
Scenarios
AnalyticDB for MySQL lets you train and run machine learning models entirely through SQL, without setting up separate ML infrastructure. SQL handles the full pipeline: data preprocessing, training, and inference.
AnalyticDB for MySQL supports BST classification models. BST models are designed for scenarios where you need to understand user behavior patterns, such as:
Analyzing user preferences over time.
Predicting future user actions.
Delivering personalized recommendations.
BST excels at capturing long-term dependencies in sequential behavior data, making it a strong fit for gaming, e-commerce, and similar industries where user interaction sequences drive business decisions.
Prerequisites
Before you begin, make sure your environment meets the following requirements:
The cluster runs Enterprise Edition, Basic Edition, or Data Lakehouse Edition.
The cluster kernel version is 3.2.4.0 or later.
NoteTo view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.
The AI resource group feature is enabled. This feature is in public preview. To enable it, contact technical support.
Step 1: Create an AI resource group and attach it to a regular resource group
Running machine learning tasks through SQL requires two types of resource groups:
AI resource group: Manages the GPU resources required for compute-intensive operations such as model training and prediction.
Regular resource group: Processes standard SQL queries, including those that generate training data or call prediction functions.
When you submit a SQL statement to the regular resource group, the system detects whether the statement requires AI computation. If it does, the system automatically forwards the task to the attached AI resource group for execution.
Create an AI resource group first, then attach it to a regular resource group.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the navigation pane on the left, choose Cluster Management > Resource Management, and then click the Resource Groups tab.
In the upper-right corner of the resource group list, click Create Resource Group to create an AI resource group.
Parameter
Description
Resource Group Name
A custom name for the resource group. The name must be 2 to 30 characters in length, start with a letter, and can contain only letters, digits, and underscores.
Job Type
From the drop-down list, select AI.
ImportantIf the AI option is not available, the AI resource group feature is not enabled for your instance. Contact technical support.
Specifications
A combination of GPU, CPU, and memory resources. You can select ADB.MLLarge.24 or ADB.MLLarge.2.
Minimum Resources
The minimum number of resource units.
Maximum Resources
The maximum number of resource units.
Click OK to create the resource group.
Find the target regular resource group. In the Actions column, click Modify. Use the ML Job Resubmission Rules to attach the regular resource group to the newly created AI resource group.
Step 2: Data transformation
Data transformation converts your raw data into the format that the BST model expects for training. Whether you need this step depends on the format and location of your raw data.
If your raw data already meets the format requirements described below, skip this step and proceed to Step 3.
Table schema requirements
The training table must have two columns:
| Column | Type | Description |
|---|---|---|
| Input feature column | String | Comma-separated integers representing the behavior event ID sequence for one sample. |
| Target label column | Integer | A binary value (0 or 1) representing the class label. |
Example: ('1,2,3',0), ('3,2,1',1)
How to transform your data
If your raw data does not match the required format, transform it as follows:
Prepare a JAR package: Package your Spark data transformation program into a JAR file and upload it to an OSS bucket.
Configure Spark job parameters: When you submit the Spark job, configure the required parameters. For more information, see Spark Application Configuration Parameters.
Step 3: Model creation and training
After your training data is ready, create and train the BST model using SQL.
In the navigation pane on the left, choose .
In the SQL Console, run the following statements to create and train the model:
-- Create a model -- Specify the resource group /*+resource_group=itrain*/ CREATE MODEL bstdemo.bst-- Model name OPTIONS ( model_type='bst_classification', -- Model type feature_cols=(event_list), -- Feature column target_cols=(target), -- Target column to predict hyperparameters = ( -- Other model hyperparameters use_best_ckpt = 'False', early_stopping_patience='0' ) ) AS SELECT event_list, target FROM bstdemo.adb; -- Source of model data. The result of this query is used to train the model with the specified variables. -- Check the model training status. The training is complete when the status changes to READY. SHOW MODEL bstdemo.bst;After you submit the
CREATE MODELstatement, training begins. RunSHOW MODEL bstdemo.bst;to check progress. Training is complete when the status changes to READY.
Step 4: Model evaluation
After training completes, evaluate the model's performance to validate its classification quality.
-- The statement format is similar to the one for creating a model.
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
feature_cols=(event_list),
target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;Review the returned evaluation metrics to determine whether the model meets your requirements before using it for prediction.
Step 5: Model prediction
After you confirm that the evaluation results are satisfactory, use the ML_PREDICT function to generate predictions on new data. The first parameter of the function is the model name. Subsequent parameters are the model's input columns. The function returns a classification result of 0 or 1 for each row.
-- Use the trained model for prediction.
-- The first parameter of the ml_predict function is the model name. Subsequent parameters are the model's input columns.
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;