This topic describes how to create, upload, and evaluate a model. This topic also describes how to use a model for inference.
Models of PolarDB for AI
Use built-in models
PolarDB for AI supports two built-in models: NL2SQL and Tongyi Qianwen. You can use the PolarDB for AI feature provided by these models without the need for additional model configurations.
For more information, see the following topics:
Build a custom model
In actual business scenarios, you may need to optimize the algorithms and adjust the structures of models. In this case, these models may not meet actual business requirements. Therefore, PolarDB for AI allows you to build a custom model and run the model in confidential containers. This further ensures the data security of your model.
To build and use a custom model, perform the following steps:
Load an external model
You can also directly upload and deploy trained external models to PolarDB by using the MLOps capabilities of PolarDB for AI and use external models such as LightGBM and GBDT based on the Sklearn framework or deep learning algorithms based on the TensorFlow or PyTorch framework.
To load an external model, perform the following steps:
Create a model
You can create a model. The model is trained in an asynchronous manner. You can check whether the model has been trained by querying its status.
Syntax
CREATE MODEL model_name WITH ( model_class = '', x_cols = '', y_cols='',model_parameter=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)
Parameters
Parameter | Description |
model_name | The model name. |
model_class | The type of the model. Valid values:
|
x_cols | The input columns for model training. Note Separate the names of multiple columns with commas (,). |
y_cols | The output columns for model training. |
model_parameter | The parameters for model training.
|
select_expr | The column name. |
table_reference | The table name. |
Examples
Create a model named airlines_gbm
.
/*polar4ai*/CREATE MODEL airlines_gbm WITH (model_class='lightgbm', x_cols ='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay',model_parameter=(boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)) as (SELECT * FROM db4ai.airlines_train)
The following output is returned:
Query OK, 0 rows affected (0.79 sec)
Upload a model
PolarDB for AI allows you to upload custom PolarDB for AI models. You can upload models that are trained offline to PolarDB for AI. Then, you can use the platform to manage the uploaded models.
Syntax
UPLOAD MODEL model_name WITH (model_location = '', req_location = '')
Parameters
Parameter | Description |
model_name | The model name. |
model_location | The file path of the model. |
req_location | The file path on which the model depends. |
Examples
Upload the my_model
model to the PolarDB for AI platform.
/*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx');
The following output is returned:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to view the model status:
/*polar4ai*/ SHOW MODEL my_model;
The following output is returned:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
If modelStatus
is displayed as saved
, the model is uploaded.
Deploy a model
You can deploy an uploaded model on the PolarDB for AI platform. After the deployment is complete, the model is in the online state. When you call the model for subsequent inference, the inference speed is faster.
Syntax
DEPLOY MODEL model_name
Parameters
Parameter | Description |
model_name | The model name. |
Examples
Deploy the my_model
model on the PolarDB for AI platform.
/*polar4ai*/ DEPLOY MODEL my_model;
The following output is returned:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to view the model status:
/*polar4ai*/ SHOW MODEL my_model;
The following output is returned:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| serving | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
If modelStatus
is displayed as serving
, the model is deployed.
View model information
View the model list
You can view all models for the current cluster.
Syntax
SHOW MODELS
Examples
/*polar4ai*/SHOW MODELS
The following output is returned:
+-----------------------+-----------------+--------------+
| model_name | model_class | model_status |
+-----------------------+-----------------+--------------+
| airline_rfr | randomforestreg | saved |
| gbrt1 | gbrt | saved |
| airline_deepfm | deepfm | saved |
| airlines_gbm | lightgbm | saved |
| lgbm1 | lightgbm | saved |
| blackfriday_linearreg | linearreg | saved |
+-----------------------+-----------------+--------------+
6 rows in set (0.24 sec)
View the model status
You can view the current status of a model. Use the following syntax to check whether a model is trained after you create the model. A model may be in one of the following states:
training: The model is being trained.
loading_data: Data is being loaded to the model.
trained: The model is trained.
saved: The model is saved.
serving: The model is in service.
deleted: The model is deleted.
Syntax
SHOW MODEL model_name
Parameters
Parameter | Description |
model_name | The model name. |
Examples
View the current status of the airlines_gbm
model.
/*polar4ai*/SHOW MODEL airlines_gbm;
The following output is returned:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
The validity period of modelPath
is 100 minutes. You can view the value of the Expires
parameter in modelPath
to check whether the link is still valid. You can access the link within its validity period.
View the model details
Syntax
DESCRIBE MODEL model_name
Parameters
Parameter | Description |
model_name | The model name. |
Examples
View the details of the airlines_gbm
model.
/*polar4ai*/DESCRIBE MODEL airlines_gbm;
The following output is returned:
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| model_name | model_description |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| airlines_gbm | basic information:model_name:airlines_gbm model_class:lightgbm feature important:features,imp_gain,imp_splitAirline,0.3327,0.0376 AirportFrom,0.2178,0.1842 Time,0.1893,0.1999 AirportTo,0.1668,0.187 DayOfWeek,0.0384,0.1236 Length,0.0307,0.1269 Flight,0.0242,0.1408 |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.65 sec)
Evaluate a model
You can evaluate the training results of a model.
Syntax
SELECT select_expr [, select_expr] ... FROM EVALUATE (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference ) WITH (x_cols = '', y_cols='', metrics='')
Parameters
Parameter | Description |
select_expr | The name of the column displayed in the model evaluation result. |
model_name | The model name. |
select_expr_for_prediction | The name of the column used for model evaluation. |
table_reference | The table name. |
row_count | The predicted number of records. |
x_cols | The input columns for model training. Separate the names of multiple columns with commas (,). |
y_cols | The output columns for model training. |
metrics | The metrics for model evaluation. Valid values:
|
Examples
Evaluate the
airlines_gbm
model that is trained./*polar4ai*/SELECT Delay FROM evaluate(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', metrics='acc');
The following output is returned:
+--------------------------------------+ | task_id | +--------------------------------------+ | df05244e-21f7-11ed-be66-xxxxxxxxxxxx | +--------------------------------------+ 1 row in set (0.95 sec)
Execute the following statement to view the task status:
/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`;
The following output is returned:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | taskStatus | filePath | results | startTime | endTime | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ 1 row in set (0.24 sec)
NoteThe validity period of
filePath
is 100 minutes. You can view the value of theExpires
parameter infilePath
to check whether the link is still valid. You can access the link within its validity period.
Use a model for inference
You can use a trained model for online or offline inference.
Syntax
SELECT select_expr [, select_expr] ... FROM PREDICT (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference LIMIT row_count) WITH (x_cols= '')
Parameters
Parameter | Description |
select_expr | The name of the column displayed in the model inference result. |
model_name | The model name. |
select_expr_for_prediction | The name of the column used for model inference. |
table_reference | The table name. |
mode | The inference mode. async indicates offline inference. If you leave this parameter empty, the used mode is online inference. |
row_count | The number of samples for model inference. |
x_cols | The input columns for model training. Separate the names of multiple columns with commas (,). |
Examples
Online inference
Use the
airlines_gbm
model for online inference./*polar4ai*/SELECT Delay FROM PREDICT(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test LIMIT 10) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay');
The following output is returned:
+-------+-------------------+ | Delay | predicted_results | +-------+-------------------+ | 1 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 1 | 0 | | 0 | 0 | | 0 | 0 | | 1 | 0 | +-------+-------------------+ 10 rows in set (0.74 sec)
Offline inference
If the number of inference samples is greater than 1,000, PolarDB for AI prompts you to use an offline inference task.
ERROR 9050 (HY000): Please limit the SQL selected data length to less than '1000' or convert to offline prediction
Create the following offline task:
/*polar4ai*/SELECT Delay FROM predict(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', mode='async');
The following output is returned:
+--------------------------------------+ | task_id | +--------------------------------------+ | bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | +--------------------------------------+ 1 row in set (0.75 sec)
The preceding result returns the task ID of the task. You can query the status and the download URL of the task results by using the task ID.
/*polar4ai*/SHOW TASK `bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx`
The following output is returned:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | taskStatus | filePath | results | startTime | endTime | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ 1 row in set (0.24 sec)
NoteThe validity period of
filePath
is 100 minutes. You can view the value of theExpires
parameter infilePath
to check whether the link is still valid. You can access the link within its validity period.
Manage offline evaluation and inference tasks
View the offline task list
You can view all offline tasks for the current cluster.
Offline tasks are batch processing tasks that run for a long period of time and do not produce results in real time. Offline tasks can be offline evaluation and offline inference.
Syntax
SHOW TASKS
Examples
Execute the following statement to view all offline tasks for the current cluster:
/*polar4ai*/SHOW TASKS;
The following output is returned:
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| task_id | task_type | task_status | start_timestr | end_timestr |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| 2cba0c74-1f8f-11ed-934a-xxxxxxxxxxxx | prediction | finish | 2022-08-19 15:18:51.206829 | |
| 77b3a186-1f94-11ed-8eaa-xxxxxxxxxxxx | evaluation | finish | 2022-08-19 15:56:44.465594 | |
| 972547a4-1fa3-11ed-9c6b-xxxxxxxxxxxx | evaluation | finish | 2022-08-19 17:44:59.790353 | 2022-08-19 17:45:23.750100 |
| bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | prediction | finish | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
| df05244e-21f7-11ed-be66-xxxxxxxxxxxx | evaluation | finish | 2022-08-22 16:53:20.381577 | 2022-08-22 16:53:37.189953 |
| ec956db8-21fb-11ed-8400-xxxxxxxxxxxx | evaluation | finish | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
9 rows in set (0.18 sec)
View the offline task status
You can view the current status of an offline task. A task may be in one of the following states:
init: The task is being initialized.
running: The task is being executed.
finish: The task is complete.
fail: The task failed.
Syntax
SHOW TASK `task_id`
Parameters
Parameter | Description |
task_id | The task ID. |
Examples
View the status of the task whose task ID is df05244e-21f7-11ed-be66-xxxxxxxxxxxx
.
/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`
The following output is returned:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| taskStatus | filePath | results | startTime | endTime |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
1 row in set (0.24 sec)
The validity period of filePath
is 100 minutes. You can view the value of the Expires
parameter in filePath
to check whether the link is still valid. You can access the link within its validity period.
Delete a model
You can delete a model when the model is no longer required.
Syntax
DROP MODEL model_name
Parameters
Parameter | Description |
model_name | The model name. |
Examples
Delete the airlines_gbm
model.
/*polar4ai*/DROP MODEL airlines_gbm
If the following result is returned, the siku_kmeans
model is deleted.
Query OK, 0 rows affected (0.57 sec)
Best practices for custom models
In the following example, a model is trained offline and used for online inference to show how to manage a customer model.
Preform offline training
You can use the following offline training script for the LightGBM algorithm:
# coding: utf-8 from pathlib import Path import pandas as pd from sklearn.metrics import mean_squared_error import lightgbm as lgb import joblib def train_model(): print('Loading data...') # load or create your dataset df_train = pd.read_csv('regression.train', header=None, sep='\t') df_test = pd.read_csv('regression.test', header=None, sep='\t') y_train = df_train[0] y_test = df_test[0] X_train = df_train.drop(0, axis=1) X_test = df_test.drop(0, axis=1) # create dataset for lightgbm lgb_train = lgb.Dataset(X_train, y_train) lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train) # specify your configurations as a dict params = { 'boosting_type': 'gbdt', 'objective': 'regression', 'metric': {'l2', 'l1'}, 'num_leaves': 31, 'learning_rate': 0.05, 'feature_fraction': 0.9, 'bagging_fraction': 0.8, 'bagging_freq': 5, 'verbose': 0 } print('Starting training...') # train gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, callbacks=[lgb.early_stopping(stopping_rounds=5)]) print('Saving model...') # save model to file # gbm.save_model('model.txt') joblib.dump(gbm, 'lgb.pkl') print('Starting predicting...') # predict y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) # eval rmse_test = mean_squared_error(y_test, y_pred) ** 0.5 print(f'The RMSE of prediction is: {rmse_test}')
The model file is exported in the
pkl
format. Thepredict
method is called to return the inference result. The python files on which the model depends are also required.The following example shows the content of the
requirements.txt
file:lightgbm==3.3.3
Upload a model
Execute the following statement to upload a model to PolarDB for AI:
/*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx')
model_location
indicates the path of the model file.req_location
indicates the path of the file on which the mode depends. You can prepare the two files in advance, upload them to your Object Storage Service (OSS) bucket, and then execute the statement to upload them to PolarDB for AI.The following output is returned:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to view the model status:
/*polar4ai*/ SHOW my_model;
The following output is returned:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+ | modelStatus | modelPath | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ | saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.23 sec)
If
modelStatus
is displayed assaved
, the model is uploaded.
Deploy a model
Execute the following statement to deploy a model on PolarDB for AI:
/*polar4ai*/ DEPLOY MODEL my_model;
The following output is returned:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to view the model status:
/*polar4ai*/ SHOW MODEL my_model;
The following output is returned:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+ | modelStatus | modelPath | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ | serving | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.23 sec)
If
modelStatus
is displayed asserving
, the model is deployed.
Use a model for online inference.
Execute the following statement to use a model to perform an online inference task:
/*polar4ai*/ SELECT Y FROM PREDICT(MODEL my_model, SELECT * FROM db4ai.regression_test LIMIT 10) WITH (x_cols = 'x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28', y_cols='');
The following output is returned:
+------+---------------------+ | Y | predicted_results | +------+---------------------+ | 1.0 | 0.6262147669037363 | | 0.0 | 0.5082804008241021 | | 0.0 | 0.37533158372209957 | | 1.0 | 0.461974928099089 | | 0.0 | 0.3777339456553666 | | 0.0 | 0.35045096227525735 | | 0.0 | 0.4178165504012342 | | 1.0 | 0.40869795422774036 | | 1.0 | 0.6826481286570045 | | 0.0 | 0.47021259543154736 | +------+---------------------+ 10 rows in set (0.95 sec)