Use SQL to implement machine learning prediction - AnalyticDB

Run a complete machine learning pipeline — data preparation, model training, evaluation, and prediction — directly in AnalyticDB for MySQL using SQL, without leaving your database environment.

This guide walks through deploying a behavior sequence transformer (BST) model to classify user behavior sequences. The BST model accepts a sequence of behavior event IDs as input and returns a binary classification result (0 or 1).

How it works

All ML jobs in AnalyticDB for MySQL use two types of resource groups:

AI resource group: manages GPU resources for compute-intensive operations such as model training and inference.
General resource group: handles regular SQL queries, such as generating training data and running prediction functions.

When you submit an ML-related SQL statement, it is first processed by the general resource group. If the statement requires AI compute, it is automatically forwarded to the associated AI resource group.

The full workflow spans five steps:

-- Step 1: Set up resource groups (one-time console setup)

-- Step 2: (Optional) Transform raw data into the required format

-- Step 3: Create and train a model
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst
OPTIONS (
  model_type='bst_classification',
  feature_cols=(event_list),
  target_cols=(target),
  hyperparameters = (
    use_best_ckpt = 'False',
    early_stopping_patience='0'
  )
)
AS SELECT event_list, target FROM bstdemo.adb;

-- Step 4: Evaluate the model
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
  feature_cols=(event_list),
  target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;

-- Step 5: Run predictions
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;

Use cases

BST models are suited for scenarios where you need to analyze sequential user behavior patterns to predict outcomes or provide personalized recommendations:

Gaming: Capture long-term dependencies between player actions (login, accept task, fight, recharge) to predict behavior categories such as churn risk or purchase likelihood.
E-commerce: Analyze browsing and purchase sequences to recommend products or predict conversion.

About the BST model

The BST model processes behavior sequence data. It accepts a sequence of behavior event IDs in string format and returns 0 or 1 as the classification result.

For example, a player's in-game activity might produce this behavior sequence: log on, receive logon rewards, accept tasks, fight, fight, fight, complete tasks, recharge, fight, and log out. This sequence maps to the following event ID string passed to the model: 0,1,2,3,3,3,4,5,3,6. The model analyzes the sequence and returns a classification result indicating whether the behavior matches a predefined category.

Prerequisites

Before you begin, make sure you have:

An AnalyticDB for MySQL cluster running Enterprise Edition, Basic Edition, or Data Lakehouse Edition, with minor version 3.2.4.0 or later
To view and update the minor version, log in to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
The AI resource group feature enabled
Note
The AI resource group feature is in public preview. To enable it, contact technical support.

Limitations

The BST model supports binary classification only — it returns 0 or 1.
Input feature values must be a comma-separated string of integers (for example, '1,2,3').
The result label column must contain binary values (0 or 1).

Step 1: Set up resource groups

To run ML jobs, you need an AI resource group for GPU compute and a general resource group linked to it.

Log in to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left navigation pane, click Clusters, then click the cluster ID.
In the left navigation pane, choose Cluster Management > Resource Management. On the Resource Management page, click the Resource Groups tab.

In the upper-right corner, click Create Resource Group. Configure the following parameters:

Parameter	Description
Resource group name	2–30 characters; letters, digits, and underscores (_); must start with a letter
Job type	Select AI from the drop-down list. If no AI option appears, contact technical support to enable the AI resource group feature.
Specifications	Select ADB.MLLarge.24, ADB.MLLarge.2, or ADB.MLAdvavced.6
Minimum resources	The minimum number of resources
Maximum resources	The maximum number of resources

Click OK. The AI resource group is created.
Find the general resource group to associate and click Modify in the Actions column. In the Modify Resource Group panel, go to the ML Job Resubmission Rules section and associate the general resource group with the AI resource group you created.

Step 2: Prepare your training data

Model training requires data in a specific table schema:

Feature column: a string of comma-separated integers, where each value is a behavior event ID (for example, '1,2,3')
Label column: a binary integer — 0 or 1 — indicating the classification category

Example rows: ('1,2,3', 0), ('3,2,1', 1).

If your raw data is already in this format, skip to Step 3.

If your raw data needs transformation:

Upload the JAR package containing your Spark data processing program to an Object Storage Service (OSS) bucket.
Submit a Spark job with the required parameters. For details, see Spark application configuration parameters.

Step 3: Create and train a model

In the left navigation pane, choose Job Development > SQL Development.

On the SQLConsole tab, run the following statements. The /*+resource_group=itrain*/ hint routes the job to the AI resource group named itrain.

-- Create and train a BST model
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst
OPTIONS (
  model_type='bst_classification',  -- Model type
  feature_cols=(event_list),        -- Input feature column
  target_cols=(target),             -- Result label column
  hyperparameters = (
    use_best_ckpt = 'False',        -- Use the last checkpoint rather than the best
    early_stopping_patience='0'     -- Disable early stopping
  )
)
AS SELECT event_list, target FROM bstdemo.adb;  -- Training data source

Check training status. Training is complete when the status is READY.
```
SHOW MODEL bstdemo.bst;
```

Step 4: Evaluate the model

Run EVALUATE MODEL against a held-out evaluation dataset to verify model accuracy. The /*resource_group=rg1*/ hint routes the job to the resource group named rg1.

/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
    feature_cols=(event_list),
    target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;

The query returns evaluation metrics that indicate how well the model classifies behavior sequences on data it has not seen during training. Use these metrics to decide whether the model is ready for production use.

Step 5: Run predictions

Pass feature columns from any table to ML_PREDICT() to classify each row. The first argument is the model name; the second is the input feature column.

SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;

The function returns 0 or 1 for each row, indicating the predicted classification category.

What's next

Spark application configuration parameters — configure Spark jobs for data preprocessing
View and update the minor version of a cluster — upgrade your cluster to meet the version requirement