BST Algorithm Overview for Deep Learning Recommendations - PolarDB

The Behavior Sequence Transformer (BST) algorithm uses the Transformer framework to model user behavior sequences and extract implicit features for prediction tasks. BST excels at capturing long-term time series patterns in sequential data, making it well-suited for recommendation systems and user lifecycle value mining.

Use cases

BST handles both classification and regression tasks. The input is a behavior sequence stored as a LONGTEXT column — an ordered list of integer behavior IDs sorted by timestamp. The output is an integer or floating-point prediction, such as a payment amount, a churn probability, or a payment confirmation flag.

Classification example

In a gaming operation scenario, construct the past 14 days of in-game player behaviors into a BST input sequence. The model predicts which paying users are likely to churn in the next 14 days. A user is considered churned if they do not log in for 14 consecutive days.

Regression example

In the same gaming context, use the first 24 hours of new user behaviors as the input sequence. The model predicts each user's total spending over the following 7 days.

Limitations

Class imbalance

BST works best when classes are roughly balanced. If the majority class has more than 20 times the samples of any minority class, preprocess the imbalanced classes using the K-means clustering algorithm in PolarDB for AI to restore a balanced class distribution before training.

Sequence and window size constraints

sequence_length must not exceed 3,000.
window_size must be greater than or equal to the maximum behavior ID value plus 1. If window_size exceeds 900, keep sequence_length well below the maximum to avoid memory issues.
When auto_heads=1, the value of int(sqrt(window_size)) + int(sqrt(sequence_length)) + 2 must not be a prime number. If it is, set auto_heads=0 and specify num_heads manually.
A small batch_size increases overfitting risk. The default is 16; use a larger value for more stable training.

Data format

Model creation table

Column	Required	Type	Description	Example
`uid`	Required	VARCHAR	ID of each data entry (user ID or product ID)	`253460731706911258`
`event_list`	Required	LONGTEXT	Behavior sequence for training. Comma-separated integer behavior IDs, sorted in ascending order by timestamp.	`"[183, 238, 153, 152]"`
`target`	Required	INT, FLOAT, DOUBLE	Sample label used to measure model metrics	`0`
`val_row`	Optional	INT	Row-level flag for validation split. `0` = training data; `1` = validation data. Takes effect only when `version=1` and `val_flag=1`. When `val_flag=0`, only rows with `val_row=0` are used.	`1`
`other_feature`	Optional	INT, FLOAT, DOUBLE, LONGTEXT	Additional features. LONGTEXT supports JSON, list, or comma-separated format. Multiple columns are allowed (e.g., `other_feature1`, `other_feature2`).	`2`
`val_x_cols`	Optional	LONGTEXT	Validation behavior sequence for parameter tuning. Takes effect only when `version=0`.	`"[183, 238, 153, 152]"`
`val_y_cols`	Optional	INT, FLOAT, DOUBLE	Validation label for parameter tuning. Takes effect only when `version=0`.	`1`

Model evaluation table

Column	Required	Type	Description	Example
`uid`	Required	VARCHAR(255)	ID of each data entry	`123213`
`event_list`	Required	LONGTEXT	Behavior sequence. Same format as the training table.	`"[183, 238, 153, 152]"`
`target`	Required	INT, FLOAT, DOUBLE	Sample label used to calculate model errors	`0`
`other_feature`	Optional	INT, FLOAT, DOUBLE, LONGTEXT	Additional features, consistent with those used during model creation	`2`

Model prediction table

Column	Required	Type	Description	Example
`uid`	Required	VARCHAR(255)	ID of each data entry	`123213`
`event_list`	Required	LONGTEXT	Behavior sequence. Same format as the training table.	`"[183, 238, 153, 152]"`
`other_feature`	Optional	INT, FLOAT, DOUBLE, LONGTEXT	Additional features, consistent with those used during model creation	`2`

Model parameters

The following parameters are values of model_parameter in the CREATE MODEL statement.

Parameter	Default	Description
`version`	`0`	Model version. `0` = old version; `1` = new version (recommended). The old version supports `val_x_cols` and `val_y_cols` but not `val_row`, multiclass classification, or stacking.
`model_task_type`	`classification`	Task type. Valid values: `classification`, `regression`, `multi_classification`.
`num_classes`	`2`	Number of prediction categories for multiclass classification. Sample labels must start at 0 and the total number of distinct labels must be less than this value. For example, when `num_classes=3`, valid labels are `{0, 1, 2}`.
`batch_size`	`16`	Batch size. A smaller value increases overfitting risk.
`window_size`	—	Size of the embedding space for behavior IDs. Must be greater than or equal to the maximum behavior ID value plus 1. Otherwise, a parsing error occurs.
`sequence_length`	—	Number of behavior events included in model calculations. Must not exceed 3,000.
`success_id`	—	The behavior ID that the model predicts.
`max_epoch`	`1`	Maximum number of training iterations.
`learning_rate`	`0.0002`	Learning rate.
`loss`	`CrossEntropyLoss`	Loss function. `CrossEntropyLoss` for binary classification; `mse`, `mae`, or `msle` for regression.
`val_flag`	`0`	Specifies whether to validate after each epoch. `0` = no validation (saves the last-epoch model); `1` = validate each epoch (saves the best-metric model; requires `val_metric` and `val_row`).
`val_metric`	`loss`	Metric used for epoch-level validation. See the table below.
`auto_data_statics`	`off`	Specifies whether to count ID occurrences in the sequence and generate statistical features. `on` = count; `off` = skip.
`auto_heads`	`1`	Specifies whether to set the number of multi-head attention heads automatically. `1` = automatic; `0` = manual (specify `num_heads`). When set to `1`, an insufficient video memory risk may occur. Verify that `int(sqrt(window_size)) + int(sqrt(sequence_length)) + 2` is not a prime number.
`num_heads`	`4`	Number of multi-head attention heads. Used only when `auto_heads=0`.
`x_value_cols`	—	Column names to use as numeric discrete features. Cannot be empty. Values must be integers or floating-point numbers. Example: `'num_events, max_level, max_viplevel'`.
`x_statics_cols`	—	Column names to use as statistical features. Cannot be empty. Each column must be LONGTEXT with fixed-length rows. Supports JSON, list, or comma-separated format. Example: `'stats_item_list, stats_event_list'`.
`x_seq_cols`	—	Column names to use as sequence features. Each column must be LONGTEXT in list or comma-separated format. Example: `'event_list'`.
`data_normalization`	`0`	Specifies whether to normalize columns specified by `x_value_cols`. `0` = off; `1` = on.
`remove_seq_adjacent_duplicates`	`off`	Specifies whether to remove adjacent duplicate values from columns specified by `x_seq_cols`. `off` = keep duplicates; `on` = remove.
`stacking`	`off`	Specifies whether to enhance the BST algorithm through model fusion. Valid only when `model_task_type='classification'`. `off` = no fusion; `on` = model fusion and deduplication.
`stacking_model`	`'gbdt,svc,rt'`	Models to include in ensemble fusion. Valid only when `stacking='on'`. Valid values: `bst`, `gbdt`, `svc`, `rt`. Cannot be empty.

Validation metrics (val_metric)

Value	What it measures	Task type
`loss`	Same loss function used during training	Classification, regression
`f1score`	Harmonic mean of precision and recall — useful when class distribution is uneven	Classification, multiclass classification
`r2_score`	Coefficient of determination — how well predictions fit the actual values	Regression
`mse`	Mean squared error — average squared difference between predictions and actual values	Regression
`mape`	Mean absolute percentage error — average percentage deviation from actual values	Regression
`mape_plus`	Variant of MAPE that measures error only on positive labels	Regression

Evaluation metrics

The following are valid values of the metrics parameter in the EVALUATE statement.

Value	What it measures	Task type
`acc`	Accuracy — proportion of correct predictions	Classification, multiclass classification
`auc`	Area under the ROC curve — model's ability to separate positive and negative classes	Classification, multiclass classification
`Fscore`	F1 score — harmonic mean of precision and recall, useful when class distribution is uneven	Classification, multiclass classification
`r2_score`	Coefficient of determination	Regression
`mse`	Mean squared error	Regression
`mape`	Mean absolute percentage error	Regression
`mape_plus`	Variant of MAPE for positive labels only	Regression

Examples

The following examples use classification tasks. For other task types, adjust model_task_type and the corresponding loss and metrics parameters.

Create a model

/*polar4ai*/CREATE MODEL sequential_bst WITH (
  model_class = 'bst',
  x_cols = 'event_list,other_feature1',
  y_cols = 'target',
  model_parameter = (
    batch_size = 128,
    window_size = 900,
    sequence_length = 3000,
    success_id = 900,
    max_epoch = 2,
    learning_rate = 0.0008,
    val_flag = 1,
    x_seq_cols = 'event_list',
    x_value_cols = 'other_feature1',
    val_metric = 'f1score',
    auto_data_statics = 'on',
    data_normalization = 1,
    remove_seq_adjacent_duplicates = 'on',
    version = 1
  )
) AS (SELECT * FROM seqential_train);

seqential_train is the model creation data table.

Evaluate a model

/*polar4ai*/SELECT uid, target FROM evaluate(
  MODEL sequential_bst,
  SELECT * FROM seqential_eval
) WITH (
  x_cols = 'event_list,other_feature1',
  y_cols = 'target',
  metrics = 'Fscore'
);

seqential_eval is the model evaluation data table.

Run predictions

/*polar4ai*/SELECT uid, target FROM PREDICT(
  MODEL sequential_bst,
  SELECT * FROM seqential_test
) WITH (
  x_cols = 'event_list,other_feature1',
  mode = 'async'
);

seqential_test is the model prediction data table.