All Products
Search
Document Center

PolarDB:Model workflow and instructions

Last Updated:Mar 28, 2026

PolarDB for AI supports three model categories: built-in large language models (LLMs) you call directly via SQL, custom models you train inside the database, and external models you upload from offline environments. This topic covers the complete lifecycle — creating, uploading, deploying, evaluating, running inference, and deleting models.

Model categories

CategoryDescriptionWorkflow
Built-in modelsCall built-in LLMs using SQL without moving data to an external AI platform. For details, see Data inference and interaction with the Qwen LLM.Call directly — no setup required
Custom modelsTrain a model on your PolarDB data using supported algorithms. Run models in confidential containers for enhanced data security.Create → Evaluate → Run inference
External modelsUpload a model you trained offline (Sklearn-based frameworks such as LightGBM and GBDT, or deep learning frameworks such as TensorFlow and PyTorch). Supported formats: PMML, ONNX, and Checkpoint.Upload → Deploy → Evaluate → Run inference
SHOW MODELS lists only custom models and uploaded external models. Built-in models are not shown.

Quick start

The following example shows the end-to-end workflow for a custom LightGBM model — from training to online inference.

-- Step 1: Create the model (asynchronous — check status before proceeding)
/*polar4ai*/CREATE MODEL airlines_gbm
WITH (
  model_class = 'lightgbm',
  x_cols      = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols      = 'Delay',
  model_parameter = (boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)
) AS (SELECT * FROM db4ai.airlines_train);

-- Step 2: Verify the model is ready (status must be "saved" before evaluating)
/*polar4ai*/SHOW MODEL airlines_gbm;

-- Step 3: Evaluate accuracy on a test set
/*polar4ai*/SELECT Delay
FROM EVALUATE(
  MODEL airlines_gbm,
  SELECT * FROM db4ai.airlines_test
) WITH (
  x_cols  = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols  = 'Delay',
  metrics = 'acc'
);

-- Step 4: Run online inference (up to 1,000 rows)
/*polar4ai*/SELECT Delay
FROM PREDICT(
  MODEL airlines_gbm,
  SELECT * FROM db4ai.airlines_test LIMIT 10
) WITH (
  x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols = 'Delay'
);

All SQL commands require the /*polar4ai*/ hint prefix.

Model statuses

After you create or upload a model, check its status with SHOW MODEL <model_name>.

StatusMeaningWhat you can do next
trainingModel is being trainedWait; check again with SHOW MODEL
loading_dataModel is loading training dataWait
trainedTraining complete, not yet savedWait for saved
savedModel is saved and readyEvaluate or run inference
servingModel is deployed and onlineRun inference (faster response)
deletedModel has been deleted
The modelPath URL in SHOW MODEL output is valid for 100 minutes. Access the URL before the time shown in the Expires parameter.

Create a model

CREATE MODEL trains a machine learning model on data in your cluster. Model creation is asynchronous — use SHOW MODEL to check when it is ready.

Syntax

CREATE MODEL model_name
WITH (
  model_class      = '',
  x_cols           = '',
  y_cols           = '',
  model_parameter  = ()
) AS (
  SELECT select_expr [, select_expr] ...
  FROM table_reference
)

Parameters

ParameterDescription
model_nameThe model name.
model_classThe algorithm. Valid values: lightgbm, deepfm, kmeans, randomforestreg, gbrt, linearreg, svr, bst. See the algorithm-specific documentation for details.
x_colsInput columns. Separate multiple column names with commas.
y_colsOutput column.
model_parameterAlgorithm hyperparameters. The valid values depend on model_class — see the corresponding algorithm documentation for details.
select_exprColumn names to include in the training query.
table_referenceTraining table name.

Example

/*polar4ai*/CREATE MODEL airlines_gbm
WITH (
  model_class     = 'lightgbm',
  x_cols          = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols          = 'Delay',
  model_parameter = (boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)
) AS (SELECT * FROM db4ai.airlines_train);

Expected output:

Query OK, 0 rows affected (0.79 sec)

After the command returns, the model is queued for training. Check the model status before evaluating or running inference.

Upload a model

UPLOAD MODEL imports an externally trained model into PolarDB for AI. Provide the model file and its runtime dependency file as pre-signed OSS URLs.

Syntax

UPLOAD MODEL model_name
WITH (
  model_location = '',
  req_location   = ''
)

Parameters

ParameterDescription
model_nameThe model name.
model_locationPre-signed OSS URL of the model file.
req_locationPre-signed URL of the requirements.txt file listing the model's runtime dependencies.

Example

/*polar4ai*/UPLOAD MODEL my_model
WITH (
  model_location = 'https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx',
  req_location   = 'https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx'
);

Expected output:

Query OK, 0 rows affected (0.29 sec)

Verify that the upload is complete — modelStatus must be saved before you deploy.

/*polar4ai*/SHOW MODEL my_model;
+-------------+-----------------------------------------------------------+
| modelStatus | modelPath                                                 |
+-------------+-----------------------------------------------------------+
| saved       | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------+

Deploy a model

DEPLOY MODEL brings an uploaded external model online. After deployment, modelStatus changes to serving and inference calls are faster.

Deployment applies only to external models. Custom models trained with CREATE MODEL move to saved status automatically and can be used for inference without explicit deployment.

Syntax

DEPLOY MODEL model_name

Parameters

ParameterDescription
model_nameThe model name.

Example

/*polar4ai*/DEPLOY MODEL my_model;

Expected output:

Query OK, 0 rows affected (0.29 sec)

Verify deployment:

/*polar4ai*/SHOW MODEL my_model;
+-------------+-----------------------------------------------------------+
| modelStatus | modelPath                                                 |
+-------------+-----------------------------------------------------------+
| serving     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------+

View models

List all models

SHOW MODELS returns all custom models and uploaded external models in the current cluster.

/*polar4ai*/SHOW MODELS;
+-----------------------+-----------------+--------------+
| model_name            | model_class     | model_status |
+-----------------------+-----------------+--------------+
| airline_rfr           | randomforestreg | saved        |
| gbrt1                 | gbrt            | saved        |
| airline_deepfm        | deepfm          | saved        |
| airlines_gbm          | lightgbm        | saved        |
| lgbm1                 | lightgbm        | saved        |
| blackfriday_linearreg | linearreg       | saved        |
+-----------------------+-----------------+--------------+

Check model status

/*polar4ai*/SHOW MODEL airlines_gbm;
+-------------+-----------------------------------------------------------+
| modelStatus | modelPath                                                 |
+-------------+-----------------------------------------------------------+
| saved       | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------+

View model details

/*polar4ai*/DESCRIBE MODEL airlines_gbm;
+-------------------+-----------------------------------------------------------------------------------+
| model_name        | model_description                                                                 |
+-------------------+-----------------------------------------------------------------------------------+
| airlines_gbm      | basic information:model_name:airlines_gbm model_class:lightgbm feature important:features,imp_gain,imp_splitAirline,0.3327,0.0376 AirportFrom,0.2178,0.1842 Time,0.1893,0.1999 AirportTo,0.1668,0.187 DayOfWeek,0.0384,0.1236 Length,0.0307,0.1269 Flight,0.0242,0.1408 |
+-------------------+-----------------------------------------------------------------------------------+

Evaluate a model

EVALUATE measures how well a model performs on a labeled test dataset. Evaluation runs as an asynchronous offline task and returns a task_id.

Syntax

SELECT select_expr [, select_expr] ...
FROM EVALUATE(
  MODEL model_name,
  SELECT select_expr_for_prediction [, select_expr_for_prediction] ...
  FROM table_reference
) WITH (
  x_cols  = '',
  y_cols  = '',
  metrics = ''
)

Parameters

ParameterDescription
select_exprColumns to display in the evaluation results.
model_nameThe model name.
select_expr_for_predictionColumns to use for evaluation.
table_referenceTest table name.
x_colsInput columns. Separate multiple column names with commas.
y_colsLabel column.
metricsEvaluation metric. Valid values: acc (accuracy, classification), r2_score (coefficient of determination R², regression), ks (KS value — ability to separate positive and negative samples, classification), auc (area under the ROC curve, classification), Fscore (F-score — harmonic mean of precision and recall, classification).

Example

Submit an evaluation task for the airlines_gbm model:

/*polar4ai*/SELECT Delay
FROM EVALUATE(
  MODEL airlines_gbm,
  SELECT * FROM db4ai.airlines_test
) WITH (
  x_cols  = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols  = 'Delay',
  metrics = 'acc'
);

The command returns a task_id:

+--------------------------------------+
| task_id                              |
+--------------------------------------+
| df05244e-21f7-11ed-be66-xxxxxxxxxxxx |
+--------------------------------------+

Check the task status using the returned ID:

/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`;
+------------+-----------------------------------------------------------+-----------------+----------------------------+----------------------------+
| taskStatus | filePath                                                  | results         | startTime                  | endTime                    |
+------------+-----------------------------------------------------------+-----------------+----------------------------+----------------------------+
| finish     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+------------+-----------------------------------------------------------+-----------------+----------------------------+----------------------------+
The filePath URL is valid for 100 minutes. Access it before the time shown in the Expires parameter.

Run inference

Use PREDICT to run inference against a model. Choose between online and offline mode based on your dataset size:

ModeWhen to useHow to trigger
OnlineUp to 1,000 rows; results returned immediatelyOmit mode parameter (default)
OfflineMore than 1,000 rows; results saved to a fileSet mode='async'
If you run online inference on more than 1,000 rows, PolarDB for AI returns an error: ERROR 9050 (HY000): Please limit the SQL selected data length to less than '1000' or convert to offline prediction. Switch to offline mode with mode='async'.

Syntax

SELECT select_expr [, select_expr] ...
FROM PREDICT(
  MODEL model_name,
  SELECT select_expr_for_prediction [, select_expr_for_prediction] ...
  FROM table_reference
  LIMIT row_count
) WITH (
  x_cols = ''
  [, mode = 'async']
)

Parameters

ParameterDescription
select_exprColumns to display in the inference results.
model_nameThe model name.
select_expr_for_predictionColumns to use for inference.
table_referenceTable to run inference on.
row_countNumber of rows to process.
x_colsInput columns. Separate multiple column names with commas.
modeSet to async for offline inference. Omit for online inference.

Online inference

The following example runs inference on 10 rows and returns results immediately:

/*polar4ai*/SELECT Delay
FROM PREDICT(
  MODEL airlines_gbm,
  SELECT * FROM db4ai.airlines_test LIMIT 10
) WITH (
  x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols = 'Delay'
);
+-------+-------------------+
| Delay | predicted_results |
+-------+-------------------+
|     1 | 0                 |
|     0 | 0                 |
|     0 | 0                 |
|     0 | 0                 |
|     0 | 0                 |
|     0 | 0                 |
|     1 | 0                 |
|     0 | 0                 |
|     0 | 0                 |
|     1 | 0                 |
+-------+-------------------+

Offline inference

The following example submits an asynchronous inference task for a full dataset:

/*polar4ai*/SELECT Delay
FROM PREDICT(
  MODEL airlines_gbm,
  SELECT * FROM db4ai.airlines_test
) WITH (
  x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
  y_cols = 'Delay',
  mode   = 'async'
);

The command returns a task_id:

+--------------------------------------+
| task_id                              |
+--------------------------------------+
| bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx |
+--------------------------------------+

Check task status and get the result file URL:

/*polar4ai*/SHOW TASK `bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx`;
+------------+-----------------------------------------------------------+---------+----------------------------+----------------------------+
| taskStatus | filePath                                                  | results | startTime                  | endTime                    |
+------------+-----------------------------------------------------------+---------+----------------------------+----------------------------+
| finish     | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |         | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
+------------+-----------------------------------------------------------+---------+----------------------------+----------------------------+
The filePath URL is valid for 100 minutes. Access it before the time shown in the Expires parameter.

Offline evaluation and inference tasks

Offline tasks are long-running batch jobs that process data asynchronously and save results to a file rather than returning them inline.

Task statuses

StatusMeaning
initTask is initializing
runningTask is in progress
finishTask completed successfully — filePath contains the result URL
failTask failed

List all tasks

SHOW TASKS returns all offline tasks in the current cluster.

/*polar4ai*/SHOW TASKS;
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| task_id                              | task_type  | task_status | start_timestr              | end_timestr                |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| 2cba0c74-1f8f-11ed-934a-xxxxxxxxxxxx | prediction | finish      | 2022-08-19 15:18:51.206829 |                            |
| 77b3a186-1f94-11ed-8eaa-xxxxxxxxxxxx | evaluation | finish      | 2022-08-19 15:56:44.465594 |                            |
| 972547a4-1fa3-11ed-9c6b-xxxxxxxxxxxx | evaluation | finish      | 2022-08-19 17:44:59.790353 | 2022-08-19 17:45:23.750100 |
| bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | prediction | finish      | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
| df05244e-21f7-11ed-be66-xxxxxxxxxxxx | evaluation | finish      | 2022-08-22 16:53:20.381577 | 2022-08-22 16:53:37.189953 |
| ec956db8-21fb-11ed-8400-xxxxxxxxxxxx | evaluation | finish      | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+

Delete a task

/*polar4ai*/DROP TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`;

Delete a model

/*polar4ai*/DROP MODEL airlines_gbm;

Expected output:

Query OK, 0 rows affected (0.57 sec)

Best practices for custom models

This section walks through the complete workflow for uploading and using an externally trained model in PolarDB for AI.

Step 1: Train the model offline

The following Python script trains a LightGBM regression model and exports it as a .pkl file:

# coding: utf-8
from pathlib import Path

import pandas as pd
from sklearn.metrics import mean_squared_error

import lightgbm as lgb
import joblib

def train_model():
    print('Loading data...')
    # Load training and test datasets
    df_train = pd.read_csv('regression.train', header=None, sep='\t')
    df_test  = pd.read_csv('regression.test',  header=None, sep='\t')

    y_train = df_train[0]
    y_test  = df_test[0]
    X_train = df_train.drop(0, axis=1)
    X_test  = df_test.drop(0, axis=1)

    # Create LightGBM datasets
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval  = lgb.Dataset(X_test, y_test, reference=lgb_train)

    params = {
        'boosting_type'   : 'gbdt',
        'objective'       : 'regression',
        'metric'          : {'l2', 'l1'},
        'num_leaves'      : 31,
        'learning_rate'   : 0.05,
        'feature_fraction': 0.9,
        'bagging_fraction': 0.8,
        'bagging_freq'    : 5,
        'verbose'         : 0
    }

    print('Training...')
    gbm = lgb.train(
        params,
        lgb_train,
        num_boost_round=20,
        valid_sets=lgb_eval,
        callbacks=[lgb.early_stopping(stopping_rounds=5)]
    )

    print('Saving model...')
    # Export as pkl — the format PolarDB for AI expects
    joblib.dump(gbm, 'lgb.pkl')

    # Validate locally before uploading
    y_pred    = gbm.predict(X_test, num_iteration=gbm.best_iteration)
    rmse_test = mean_squared_error(y_test, y_pred) ** 0.5
    print(f'RMSE: {rmse_test}')

Also create a requirements.txt listing the model's runtime dependencies:

lightgbm==3.3.3

Step 2: Upload the model

Upload both files to a private OSS bucket and generate pre-signed URLs, then run:

/*polar4ai*/UPLOAD MODEL my_model
WITH (
  model_location = 'https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx',
  req_location   = 'https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx'
);

Verify the upload:

/*polar4ai*/SHOW MODEL my_model;

modelStatus must be saved before proceeding.

Step 3: Deploy the model

/*polar4ai*/DEPLOY MODEL my_model;

Verify deployment:

/*polar4ai*/SHOW MODEL my_model;

modelStatus must be serving before running inference.

Step 4: Run online inference

/*polar4ai*/SELECT Y
FROM PREDICT(
  MODEL my_model,
  SELECT * FROM db4ai.regression_test LIMIT 10
) WITH (
  x_cols = 'x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28',
  y_cols = ''
);
+------+---------------------+
| Y    | predicted_results   |
+------+---------------------+
|  1.0 | 0.6262147669037363  |
|  0.0 | 0.5082804008241021  |
|  0.0 | 0.37533158372209957 |
|  1.0 | 0.461974928099089   |
|  0.0 | 0.3777339456553666  |
|  0.0 | 0.35045096227525735 |
|  0.0 | 0.4178165504012342  |
|  1.0 | 0.40869795422774036 |
|  1.0 | 0.6826481286570045  |
|  0.0 | 0.47021259543154736 |
+------+---------------------+

What's next

Alibaba Cloud is responsible for the maintenance of the underlying infrastructure and the software provided by Alibaba Cloud, such as the service's technical architecture and operating system. You are responsible for everything above the operating system, such as the applications you install. The instance's runtime environment belongs to you. If you upgrade the operating system yourself, you may experience adverse effects, such as downtime. Proceed with caution and be aware of the risks.