This topic describes how to use SQL in AnalyticDB for MySQL to perform machine learning tasks by deploying a Behavior Sequence Transformer (BST) model. The process covers the entire workflow, from data transformation and model creation to training, evaluation, and prediction.
Scenarios
You can use SQL queries to quickly perform model training and inference. SQL simplifies the entire process, including data pre-processing, training, and inference.
AnalyticDB for MySQL supports BST models. You can use these models for scenarios that require understanding user behavior patterns, analyzing user preferences, predicting future trends, or providing personalized recommendations. For example, in the gaming or e-commerce industries, you can capture long-term dependencies in user behavior. This helps you better understand and predict user actions and preferences to deliver personalized services and recommendations.
Prerequisites
The cluster is an Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.
The cluster kernel version must be 3.2.4.0 or later.
NoteTo view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
The AI resource group feature must be enabled. This feature is in public preview. To enable it, contact technical support.
Step 1: Create an AI resource group and attach it to a regular resource group
To run machine learning tasks using SQL, you need two types of resource groups:
AI resource group: This group is designed to support machine learning tasks. It manages the GPU resources required for these tasks and supports compute-intensive operations in model training and prediction.
Regular resource group: This group is used to process standard SQL queries, such as generating training data or running prediction functions.
SQL statements are first submitted to the regular resource group. If the system detects that a SQL statement requires AI computation, the task is automatically forwarded to the attached AI resource group for execution.
Therefore, you must first create an AI resource group and attach it to a regular resource group.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the navigation pane on the left, choose Cluster Management > Resource Management, and then click the Resource Groups tab.
In the upper-right corner of the resource group list, click Create Resource Group to create an AI resource group.
Parameter
Description
Resource Group Name
A custom name for the resource group. The name must be 2 to 30 characters in length, start with a letter, and can contain only letters, digits, and underscores.
Job Type
From the drop-down list, select AI.
ImportantIf the AI option is not available, the AI resource group feature is not enabled for your instance. Contact technical support.
Specifications
A combination of GPU, CPU, and memory resources. You can select ADB.MLLarge.24, ADB.MLLarge.2, or ADB.MLAdvavced.6.
Minimum Resources
The minimum number of resource units.
Maximum Resources
The maximum number of resource units.
Click OK to create the resource group.
Find the target regular resource group. In the Actions column, click Modify. Use the ML Job Resubmission Rules to attach the regular resource group to the newly created AI resource group.
Step 2: Data transformation
Data transformation is the process of converting raw data into a format suitable for model training. This process depends on three key factors: the format of the raw data, its storage location, and the input data format required by the model.
Table schema requirements for model training data
Input feature column: This column must be a string that contains a series of comma-separated integers. Each entry represents the feature vector of a sample.
Target label column: This column contains two possible values, such as 0 or 1, to represent the class labels in a classification task.
Example table schema: ('1,2,3',0), ('3,2,1',1).
Data transformation method
If your raw data format differs from the format required for model training, use the following method to transform the data:
Prepare a JAR package: Package your Spark data transformation program into a JAR file and upload it to an OSS bucket.
Configure Spark job parameters: When you submit the Spark job, configure the required parameters. For more information about the parameters, see Spark Application Configuration Parameters.
If your raw data already meets the format requirements for model training, you can skip this step.
Step 3: Model creation and training
Create a machine learning model, define its parameters, specify the training data, and check the model status.
In the navigation pane on the left, choose .
In the SQL Console, use model statements to create and train the model.
-- Create a model
-- Specify the resource group
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst-- Model name
OPTIONS (
model_type='bst_classification', -- Model type
feature_cols=(event_list), -- Feature column
target_cols=(target), -- Target column to predict
hyperparameters = ( -- Other model hyperparameters
use_best_ckpt = 'False',
early_stopping_patience='0'
)
)
AS SELECT event_list, target FROM bstdemo.adb; -- Source of model data. The result of this query is used to train the model with the specified variables.
-- Check the model training status. The training is complete when the status changes to READY.
SHOW MODEL bstdemo.bst;Step 4: Model evaluation
Evaluate the created machine learning model to validate its performance.
-- The statement format is similar to the one for creating a model.
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
feature_cols=(event_list),
target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;Step 5: Model prediction
Select feature columns from a table, pass the data to the trained model, and obtain prediction results based on this input data.
-- Use the trained model for prediction.
-- The first parameter of the ml_predict function is the model name. Subsequent parameters are the model's input columns.
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;