All Products
Search
Document Center

PolarDB:Manage models

Last Updated:Mar 25, 2024

This topic describes how to create, upload, and evaluate a model. This topic also describes how to use a model for inference.

Models of PolarDB for AI

Use built-in models

PolarDB for AI supports two built-in models: NL2SQL and Tongyi Qianwen. You can use the PolarDB for AI feature provided by these models without the need for additional model configurations.

For more information, see the following topics:

Build a custom model

In actual business scenarios, you may need to optimize the algorithms and adjust the structures of models. In this case, these models may not meet actual business requirements. Therefore, PolarDB for AI allows you to build a custom model and run the model in confidential containers. This further ensures the data security of your model.

To build and use a custom model, perform the following steps:

Load an external model

You can also directly upload and deploy trained external models to PolarDB by using the MLOps capabilities of PolarDB for AI and use external models such as LightGBM and GBDT based on the Sklearn framework or deep learning algorithms based on the TensorFlow or PyTorch framework.

To load an external model, perform the following steps:

Create a model

You can create a model. The model is trained in an asynchronous manner. You can check whether the model has been trained by querying its status.

Syntax

CREATE MODEL model_name WITH ( model_class = '', x_cols = '', y_cols='',model_parameter=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)

Parameters

Parameter

Description

model_name

The model name.

model_class

The type of the model. Valid values:

x_cols

The input columns for model training.

Note

Separate the names of multiple columns with commas (,).

y_cols

The output columns for model training.

model_parameter

The parameters for model training.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to lightgbm, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to deepfm, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to kmeans, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to randomforestreg, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to gbrt, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to linearreg, see Parameters.

  • For more information about the value of the model_parameter parameter when the model_class parameter is set to svr, see Parameters.

select_expr

The column name.

table_reference

The table name.

Examples

Create a model named airlines_gbm.

/*polar4ai*/CREATE MODEL airlines_gbm WITH (model_class='lightgbm', x_cols ='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay',model_parameter=(boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)) as (SELECT * FROM db4ai.airlines_train)

The following output is returned:

Query OK, 0 rows affected (0.79 sec)

Upload a model

PolarDB for AI allows you to upload custom PolarDB for AI models. You can upload models that are trained offline to PolarDB for AI. Then, you can use the platform to manage the uploaded models.

Syntax

UPLOAD MODEL model_name WITH (model_location = '', req_location = '') 

Parameters

Parameter

Description

model_name

The model name.

model_location

The file path of the model.

req_location

The file path on which the model depends.

Examples

Upload the my_model model to the PolarDB for AI platform.

/*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx');

The following output is returned:

Query OK, 0 rows affected (0.29 sec)

Execute the following statement to view the model status:

/*polar4ai*/ SHOW MODEL my_model;

The following output is returned:

+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath                                                                                                                   |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved       | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx  |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)

If modelStatus is displayed as saved, the model is uploaded.

Deploy a model

You can deploy an uploaded model on the PolarDB for AI platform. After the deployment is complete, the model is in the online state. When you call the model for subsequent inference, the inference speed is faster.

Syntax

DEPLOY MODEL model_name

Parameters

Parameter

Description

model_name

The model name.

Examples

Deploy the my_model model on the PolarDB for AI platform.

/*polar4ai*/ DEPLOY MODEL my_model;

The following output is returned:

Query OK, 0 rows affected (0.29 sec)

Execute the following statement to view the model status:

/*polar4ai*/ SHOW MODEL my_model;

The following output is returned:

+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath                                                                                                                   |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| serving     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx  |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)

If modelStatus is displayed as serving, the model is deployed.

View model information

View the model list

You can view all models for the current cluster.

Syntax

SHOW MODELS

Examples

/*polar4ai*/SHOW MODELS

The following output is returned:

+-----------------------+-----------------+--------------+
| model_name            | model_class     | model_status |
+-----------------------+-----------------+--------------+
| airline_rfr           | randomforestreg | saved        |
| gbrt1                 | gbrt            | saved        |
| airline_deepfm        | deepfm          | saved        |
| airlines_gbm          | lightgbm        | saved        |
| lgbm1                 | lightgbm        | saved        |
| blackfriday_linearreg | linearreg       | saved        |
+-----------------------+-----------------+--------------+
6 rows in set (0.24 sec)

View the model status

You can view the current status of a model. Use the following syntax to check whether a model is trained after you create the model. A model may be in one of the following states:

  • training: The model is being trained.

  • loading_data: Data is being loaded to the model.

  • trained: The model is trained.

  • saved: The model is saved.

  • serving: The model is in service.

  • deleted: The model is deleted.

Syntax

SHOW MODEL model_name 

Parameters

Parameter

Description

model_name

The model name.

Examples

View the current status of the airlines_gbm model.

/*polar4ai*/SHOW MODEL airlines_gbm;

The following output is returned:

+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath                                                                                                                   |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved       | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx  |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
Note

The validity period of modelPath is 100 minutes. You can view the value of the Expires parameter in modelPath to check whether the link is still valid. You can access the link within its validity period.

View the model details

Syntax

DESCRIBE MODEL model_name

Parameters

Parameter

Description

model_name

The model name.

Examples

View the details of the airlines_gbm model.

/*polar4ai*/DESCRIBE MODEL airlines_gbm;

The following output is returned:

+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| model_name        | model_description                                                                                                                                                                                                                                                                 |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| airlines_gbm      | basic information:model_name:airlines_gbm model_class:lightgbm feature important:features,imp_gain,imp_splitAirline,0.3327,0.0376 AirportFrom,0.2178,0.1842 Time,0.1893,0.1999 AirportTo,0.1668,0.187 DayOfWeek,0.0384,0.1236 Length,0.0307,0.1269 Flight,0.0242,0.1408           |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.65 sec)

Evaluate a model

You can evaluate the training results of a model.

Syntax

SELECT select_expr [, select_expr] ... FROM EVALUATE (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference ) WITH (x_cols = '', y_cols='', metrics='')

Parameters

Parameter

Description

select_expr

The name of the column displayed in the model evaluation result.

model_name

The model name.

select_expr_for_prediction

The name of the column used for model evaluation.

table_reference

The table name.

row_count

The predicted number of records.

x_cols

The input columns for model training. Separate the names of multiple columns with commas (,).

y_cols

The output columns for model training.

metrics

The metrics for model evaluation. Valid values:

  • acc: the accuracy, used for classification tasks.

  • r2_score: the coefficient of determination, used for regression tasks.

Examples

  1. Evaluate the airlines_gbm model that is trained.

    /*polar4ai*/SELECT Delay FROM evaluate(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', metrics='acc');

    The following output is returned:

    +--------------------------------------+
    | task_id                              |
    +--------------------------------------+
    | df05244e-21f7-11ed-be66-xxxxxxxxxxxx |
    +--------------------------------------+
    1 row in set (0.95 sec)
  2. Execute the following statement to view the task status:

    /*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`;

    The following output is returned:

    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    | taskStatus | filePath                                                                                                                                           | results         | startTime                  | endTime                    |
    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    | finish     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    1 row in set (0.24 sec)
    Note

    The validity period of filePath is 100 minutes. You can view the value of the Expires parameter in filePath to check whether the link is still valid. You can access the link within its validity period.

Use a model for inference

You can use a trained model for online or offline inference.

Syntax

SELECT select_expr [, select_expr] ... FROM PREDICT (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference LIMIT row_count) WITH (x_cols= '')

Parameters

Parameter

Description

select_expr

The name of the column displayed in the model inference result.

model_name

The model name.

select_expr_for_prediction

The name of the column used for model inference.

table_reference

The table name.

mode

The inference mode. async indicates offline inference. If you leave this parameter empty, the used mode is online inference.

row_count

The number of samples for model inference.

x_cols

The input columns for model training. Separate the names of multiple columns with commas (,).

Examples

  • Online inference

    Use the airlines_gbm model for online inference.

    /*polar4ai*/SELECT Delay FROM PREDICT(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test LIMIT 10) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay');

    The following output is returned:

    +-------+-------------------+
    | Delay | predicted_results |
    +-------+-------------------+
    |     1 | 0                 |
    |     0 | 0                 |
    |     0 | 0                 |
    |     0 | 0                 |
    |     0 | 0                 |
    |     0 | 0                 |
    |     1 | 0                 |
    |     0 | 0                 |
    |     0 | 0                 |
    |     1 | 0                 |
    +-------+-------------------+
    10 rows in set (0.74 sec)
  • Offline inference

    If the number of inference samples is greater than 1,000, PolarDB for AI prompts you to use an offline inference task.

    ERROR 9050 (HY000): Please limit the SQL selected data length to less than '1000' or convert to offline prediction

    Create the following offline task:

    /*polar4ai*/SELECT Delay FROM predict(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', mode='async');

    The following output is returned:

    +--------------------------------------+
    | task_id                              |
    +--------------------------------------+
    | bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx |
    +--------------------------------------+
    1 row in set (0.75 sec)

    The preceding result returns the task ID of the task. You can query the status and the download URL of the task results by using the task ID.

    /*polar4ai*/SHOW TASK `bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx`

    The following output is returned:

    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    | taskStatus | filePath                                                                                                                                           | results         | startTime                  | endTime                    |
    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    | finish     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |                 | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
    +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
    1 row in set (0.24 sec)
    Note

    The validity period of filePath is 100 minutes. You can view the value of the Expires parameter in filePath to check whether the link is still valid. You can access the link within its validity period.

Manage offline evaluation and inference tasks

View the offline task list

You can view all offline tasks for the current cluster.

Note

Offline tasks are batch processing tasks that run for a long period of time and do not produce results in real time. Offline tasks can be offline evaluation and offline inference.

Syntax

SHOW TASKS

Examples

Execute the following statement to view all offline tasks for the current cluster:

/*polar4ai*/SHOW TASKS;

The following output is returned:

+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| task_id                              | task_type  | task_status | start_timestr              | end_timestr                |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| 2cba0c74-1f8f-11ed-934a-xxxxxxxxxxxx | prediction | finish      | 2022-08-19 15:18:51.206829 |                            |
| 77b3a186-1f94-11ed-8eaa-xxxxxxxxxxxx | evaluation | finish      | 2022-08-19 15:56:44.465594 |                            |
| 972547a4-1fa3-11ed-9c6b-xxxxxxxxxxxx | evaluation | finish      | 2022-08-19 17:44:59.790353 | 2022-08-19 17:45:23.750100 |
| bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | prediction | finish      | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
| df05244e-21f7-11ed-be66-xxxxxxxxxxxx | evaluation | finish      | 2022-08-22 16:53:20.381577 | 2022-08-22 16:53:37.189953 |
| ec956db8-21fb-11ed-8400-xxxxxxxxxxxx | evaluation | finish      | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
9 rows in set (0.18 sec)

View the offline task status

You can view the current status of an offline task. A task may be in one of the following states:

  • init: The task is being initialized.

  • running: The task is being executed.

  • finish: The task is complete.

  • fail: The task failed.

Syntax

SHOW TASK `task_id` 

Parameters

Parameter

Description

task_id

The task ID.

Examples

View the status of the task whose task ID is df05244e-21f7-11ed-be66-xxxxxxxxxxxx.

/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`

The following output is returned:

+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| taskStatus | filePath                                                                                                                                           | results         | startTime                  | endTime                    |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| finish     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
1 row in set (0.24 sec)
Note

The validity period of filePath is 100 minutes. You can view the value of the Expires parameter in filePath to check whether the link is still valid. You can access the link within its validity period.

Delete a model

You can delete a model when the model is no longer required.

Syntax

DROP MODEL model_name 

Parameters

Parameter

Description

model_name

The model name.

Examples

Delete the airlines_gbm model.

/*polar4ai*/DROP MODEL airlines_gbm 

If the following result is returned, the siku_kmeans model is deleted.

Query OK, 0 rows affected (0.57 sec)

Best practices for custom models

In the following example, a model is trained offline and used for online inference to show how to manage a customer model.

  1. Preform offline training

    You can use the following offline training script for the LightGBM algorithm:

    # coding: utf-8
    from pathlib import Path
    
    import pandas as pd
    from sklearn.metrics import mean_squared_error
    
    import lightgbm as lgb
    import joblib
    
    def train_model():
        print('Loading data...')
        # load or create your dataset
        df_train = pd.read_csv('regression.train', header=None, sep='\t')
        df_test = pd.read_csv('regression.test', header=None, sep='\t')
    
        y_train = df_train[0]
        y_test = df_test[0]
        X_train = df_train.drop(0, axis=1)
        X_test = df_test.drop(0, axis=1)
    
        # create dataset for lightgbm
        lgb_train = lgb.Dataset(X_train, y_train)
        lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
    
    
        # specify your configurations as a dict
        params = {
            'boosting_type': 'gbdt',
            'objective': 'regression',
            'metric': {'l2', 'l1'},
            'num_leaves': 31,
            'learning_rate': 0.05,
            'feature_fraction': 0.9,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': 0
        }
    
        print('Starting training...')
        # train
        gbm = lgb.train(params,
                        lgb_train,
                        num_boost_round=20,
                        valid_sets=lgb_eval,
                        callbacks=[lgb.early_stopping(stopping_rounds=5)])
    
    
        print('Saving model...')
        # save model to file
        # gbm.save_model('model.txt')
        joblib.dump(gbm, 'lgb.pkl')
    
        print('Starting predicting...')
        # predict
        y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
        # eval
        rmse_test = mean_squared_error(y_test, y_pred) ** 0.5
        print(f'The RMSE of prediction is: {rmse_test}')
    

    The model file is exported in the pkl format. The predict method is called to return the inference result. The python files on which the model depends are also required.

    The following example shows the content of the requirements.txt file:

    lightgbm==3.3.3
  2. Upload a model

    1. Execute the following statement to upload a model to PolarDB for AI:

      /*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx')

      model_location indicates the path of the model file. req_location indicates the path of the file on which the mode depends. You can prepare the two files in advance, upload them to your Object Storage Service (OSS) bucket, and then execute the statement to upload them to PolarDB for AI.

      The following output is returned:

      Query OK, 0 rows affected (0.29 sec)
    2. Execute the following statement to view the model status:

      /*polar4ai*/ SHOW my_model;

      The following output is returned:

      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      | modelStatus | modelPath                                                                                                                   |
      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      | saved       | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx  |
      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      1 row in set (0.23 sec)

      If modelStatus is displayed as saved, the model is uploaded.

  3. Deploy a model

    1. Execute the following statement to deploy a model on PolarDB for AI:

      /*polar4ai*/ DEPLOY MODEL my_model;

      The following output is returned:

      Query OK, 0 rows affected (0.29 sec)
    2. Execute the following statement to view the model status:

      /*polar4ai*/ SHOW MODEL my_model;

      The following output is returned:

      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      | modelStatus | modelPath                                                                                                                   |
      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      | serving     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx  |
      +-------------+-----------------------------------------------------------------------------------------------------------------------------+
      1 row in set (0.23 sec)

      If modelStatus is displayed as serving, the model is deployed.

  4. Use a model for online inference.

    Execute the following statement to use a model to perform an online inference task:

    /*polar4ai*/ SELECT Y FROM PREDICT(MODEL my_model, SELECT * FROM db4ai.regression_test LIMIT 10) WITH (x_cols = 'x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28', y_cols='');

    The following output is returned:

    +------+---------------------+
    | Y    | predicted_results   |
    +------+---------------------+
    |  1.0 | 0.6262147669037363  |
    |  0.0 | 0.5082804008241021  |
    |  0.0 | 0.37533158372209957 |
    |  1.0 | 0.461974928099089   |
    |  0.0 | 0.3777339456553666  |
    |  0.0 | 0.35045096227525735 |
    |  0.0 | 0.4178165504012342  |
    |  1.0 | 0.40869795422774036 |
    |  1.0 | 0.6826481286570045  |
    |  0.0 | 0.47021259543154736 |
    +------+---------------------+
    10 rows in set (0.95 sec)