The Behavior Sequence Transformer (BST) algorithm uses the Transformer framework to model user behavior sequences and extract implicit features for prediction tasks. BST excels at capturing long-term time series patterns in sequential data, making it well-suited for recommendation systems and user lifecycle value mining.
Use cases
BST handles both classification and regression tasks. The input is a behavior sequence stored as a LONGTEXT column — an ordered list of integer behavior IDs sorted by timestamp. The output is an integer or floating-point prediction, such as a payment amount, a churn probability, or a payment confirmation flag.
Classification example
In a gaming operation scenario, construct the past 14 days of in-game player behaviors into a BST input sequence. The model predicts which paying users are likely to churn in the next 14 days. A user is considered churned if they do not log in for 14 consecutive days.
Regression example
In the same gaming context, use the first 24 hours of new user behaviors as the input sequence. The model predicts each user's total spending over the following 7 days.
Limitations
Class imbalance
BST works best when classes are roughly balanced. If the majority class has more than 20 times the samples of any minority class, preprocess the imbalanced classes using the K-means clustering algorithm in PolarDB for AI to restore a balanced class distribution before training.
Sequence and window size constraints
sequence_lengthmust not exceed 3,000.window_sizemust be greater than or equal to the maximum behavior ID value plus 1. Ifwindow_sizeexceeds 900, keepsequence_lengthwell below the maximum to avoid memory issues.When
auto_heads=1, the value ofint(sqrt(window_size)) + int(sqrt(sequence_length)) + 2must not be a prime number. If it is, setauto_heads=0and specifynum_headsmanually.A small
batch_sizeincreases overfitting risk. The default is 16; use a larger value for more stable training.
Data format
Model creation table
| Column | Required | Type | Description | Example |
|---|---|---|---|---|
uid | Required | VARCHAR | ID of each data entry (user ID or product ID) | 253460731706911258 |
event_list | Required | LONGTEXT | Behavior sequence for training. Comma-separated integer behavior IDs, sorted in ascending order by timestamp. | "[183, 238, 153, 152]" |
target | Required | INT, FLOAT, DOUBLE | Sample label used to measure model metrics | 0 |
val_row | Optional | INT | Row-level flag for validation split. 0 = training data; 1 = validation data. Takes effect only when version=1 and val_flag=1. When val_flag=0, only rows with val_row=0 are used. | 1 |
other_feature | Optional | INT, FLOAT, DOUBLE, LONGTEXT | Additional features. LONGTEXT supports JSON, list, or comma-separated format. Multiple columns are allowed (e.g., other_feature1, other_feature2). | 2 |
val_x_cols | Optional | LONGTEXT | Validation behavior sequence for parameter tuning. Takes effect only when version=0. | "[183, 238, 153, 152]" |
val_y_cols | Optional | INT, FLOAT, DOUBLE | Validation label for parameter tuning. Takes effect only when version=0. | 1 |
Model evaluation table
| Column | Required | Type | Description | Example |
|---|---|---|---|---|
uid | Required | VARCHAR(255) | ID of each data entry | 123213 |
event_list | Required | LONGTEXT | Behavior sequence. Same format as the training table. | "[183, 238, 153, 152]" |
target | Required | INT, FLOAT, DOUBLE | Sample label used to calculate model errors | 0 |
other_feature | Optional | INT, FLOAT, DOUBLE, LONGTEXT | Additional features, consistent with those used during model creation | 2 |
Model prediction table
| Column | Required | Type | Description | Example |
|---|---|---|---|---|
uid | Required | VARCHAR(255) | ID of each data entry | 123213 |
event_list | Required | LONGTEXT | Behavior sequence. Same format as the training table. | "[183, 238, 153, 152]" |
other_feature | Optional | INT, FLOAT, DOUBLE, LONGTEXT | Additional features, consistent with those used during model creation | 2 |
Model parameters
The following parameters are values of model_parameter in the CREATE MODEL statement.
| Parameter | Default | Description |
|---|---|---|
version | 0 | Model version. 0 = old version; 1 = new version (recommended). The old version supports val_x_cols and val_y_cols but not val_row, multiclass classification, or stacking. |
model_task_type | classification | Task type. Valid values: classification, regression, multi_classification. |
num_classes | 2 | Number of prediction categories for multiclass classification. Sample labels must start at 0 and the total number of distinct labels must be less than this value. For example, when num_classes=3, valid labels are {0, 1, 2}. |
batch_size | 16 | Batch size. A smaller value increases overfitting risk. |
window_size | — | Size of the embedding space for behavior IDs. Must be greater than or equal to the maximum behavior ID value plus 1. Otherwise, a parsing error occurs. |
sequence_length | — | Number of behavior events included in model calculations. Must not exceed 3,000. |
success_id | — | The behavior ID that the model predicts. |
max_epoch | 1 | Maximum number of training iterations. |
learning_rate | 0.0002 | Learning rate. |
loss | CrossEntropyLoss | Loss function. CrossEntropyLoss for binary classification; mse, mae, or msle for regression. |
val_flag | 0 | Specifies whether to validate after each epoch. 0 = no validation (saves the last-epoch model); 1 = validate each epoch (saves the best-metric model; requires val_metric and val_row). |
val_metric | loss | Metric used for epoch-level validation. See the table below. |
auto_data_statics | off | Specifies whether to count ID occurrences in the sequence and generate statistical features. on = count; off = skip. |
auto_heads | 1 | Specifies whether to set the number of multi-head attention heads automatically. 1 = automatic; 0 = manual (specify num_heads). When set to 1, an insufficient video memory risk may occur. Verify that int(sqrt(window_size)) + int(sqrt(sequence_length)) + 2 is not a prime number. |
num_heads | 4 | Number of multi-head attention heads. Used only when auto_heads=0. |
x_value_cols | — | Column names to use as numeric discrete features. Cannot be empty. Values must be integers or floating-point numbers. Example: 'num_events, max_level, max_viplevel'. |
x_statics_cols | — | Column names to use as statistical features. Cannot be empty. Each column must be LONGTEXT with fixed-length rows. Supports JSON, list, or comma-separated format. Example: 'stats_item_list, stats_event_list'. |
x_seq_cols | — | Column names to use as sequence features. Each column must be LONGTEXT in list or comma-separated format. Example: 'event_list'. |
data_normalization | 0 | Specifies whether to normalize columns specified by x_value_cols. 0 = off; 1 = on. |
remove_seq_adjacent_duplicates | off | Specifies whether to remove adjacent duplicate values from columns specified by x_seq_cols. off = keep duplicates; on = remove. |
stacking | off | Specifies whether to enhance the BST algorithm through model fusion. Valid only when model_task_type='classification'. off = no fusion; on = model fusion and deduplication. |
stacking_model | 'gbdt,svc,rt' | Models to include in ensemble fusion. Valid only when stacking='on'. Valid values: bst, gbdt, svc, rt. Cannot be empty. |
Validation metrics (val_metric)
| Value | What it measures | Task type |
|---|---|---|
loss | Same loss function used during training | Classification, regression |
f1score | Harmonic mean of precision and recall — useful when class distribution is uneven | Classification, multiclass classification |
r2_score | Coefficient of determination — how well predictions fit the actual values | Regression |
mse | Mean squared error — average squared difference between predictions and actual values | Regression |
mape | Mean absolute percentage error — average percentage deviation from actual values | Regression |
mape_plus | Variant of MAPE that measures error only on positive labels | Regression |
Evaluation metrics
The following are valid values of the metrics parameter in the EVALUATE statement.
| Value | What it measures | Task type |
|---|---|---|
acc | Accuracy — proportion of correct predictions | Classification, multiclass classification |
auc | Area under the ROC curve — model's ability to separate positive and negative classes | Classification, multiclass classification |
Fscore | F1 score — harmonic mean of precision and recall, useful when class distribution is uneven | Classification, multiclass classification |
r2_score | Coefficient of determination | Regression |
mse | Mean squared error | Regression |
mape | Mean absolute percentage error | Regression |
mape_plus | Variant of MAPE for positive labels only | Regression |
Examples
The following examples use classification tasks. For other task types, adjustmodel_task_typeand the correspondinglossandmetricsparameters.
Create a model
/*polar4ai*/CREATE MODEL sequential_bst WITH (
model_class = 'bst',
x_cols = 'event_list,other_feature1',
y_cols = 'target',
model_parameter = (
batch_size = 128,
window_size = 900,
sequence_length = 3000,
success_id = 900,
max_epoch = 2,
learning_rate = 0.0008,
val_flag = 1,
x_seq_cols = 'event_list',
x_value_cols = 'other_feature1',
val_metric = 'f1score',
auto_data_statics = 'on',
data_normalization = 1,
remove_seq_adjacent_duplicates = 'on',
version = 1
)
) AS (SELECT * FROM seqential_train);seqential_train is the model creation data table.
Evaluate a model
/*polar4ai*/SELECT uid, target FROM evaluate(
MODEL sequential_bst,
SELECT * FROM seqential_eval
) WITH (
x_cols = 'event_list,other_feature1',
y_cols = 'target',
metrics = 'Fscore'
);seqential_eval is the model evaluation data table.
Run predictions
/*polar4ai*/SELECT uid, target FROM PREDICT(
MODEL sequential_bst,
SELECT * FROM seqential_test
) WITH (
x_cols = 'event_list,other_feature1',
mode = 'async'
);seqential_test is the model prediction data table.