All Products
Search
Document Center

:Syntax specification

Last Updated:Jul 25, 2025

You can use RDS SQLFlow statements to efficiently perform model training, prediction, and evaluation. This topic describes the syntax specification of RDS SQLFlow in detail.

Introduction

You can use RDS SQLFlow statements to efficiently perform model training, prediction, and evaluation without writing complex Python code. RDS SQLFlow automatically parses these statements and converts them into equivalent Python programs, which significantly simplifies the model training process. The syntax of RDS SQLFlow covers various features, including the following:

For example, you can use the iris dataset and train a TensorFlow DNNClassifier algorithm. In the dataset, the first four columns represent features that indicate the botanical shape of each iris flower, and the last column is the label that represents the subspecies of each iris flower.

sepal_length

sepal_width

petal_length

petal_width

class

6.4

2.8

5.6

2.2

2

5.0

2.3

3.3

1.0

1

...

...

...

...

...

With the following simple RDS SQLFlow statement, you can easily train a DNNClassifier model with two hidden layers, each containing ten hidden units. After training, the model will be saved to the sqlflow_models.my_dnn_model table for subsequent prediction. RDS SQLFlow automatically parses these statements and converts them into equivalent Python programs, which significantly simplifies the model training process.

SELECT * FROM iris.train
TO TRAIN DNNClassifier
WITH model.hidden_units = [10, 10], model.n_classes = 3, train.epoch= 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO sqlflow_models.my_dnn_model;

Training syntax

RDS SQLFlow training statements consist of SELECT, TRAIN, COLUMN, LABEL, and INTO clauses.

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]
TO TRAIN model_identifier
[WITH
  model_attr_expr [, model_attr_expr ...]
  [, train_attr_expr ...]]
COLUMN column_expr [, column_expr ...]
  | COLUMN column_expr [, column_expr ...] FOR column_name
    [COLUMN column_expr [, column_expr ...] FOR column_name ...]
[LABEL label_expr]
INTO table_references;

SELECT clause

The SELECT clause is used to retrieve data from a specified table.

Syntax:

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]

Parameter description:

Parameter

Description

select_expr

The data columns to retrieve. At least one column must be specified.

table_references

The table from which to retrieve data.

where_condition

An expression used to filter rows.

row_count

The maximum number of rows to retrieve.

Example:

For example, to quickly train a binary classification model on the sample dataset (iris.train):

SELECT * FROM iris.train
WHERE class = 0 OR class = 1
LIMIT 1000
TO TRAIN ...

TRAIN clause

The TRAIN clause is used to describe a specific model type and its training algorithm.

Syntax:

TO TRAIN model_identifier
WITH
  model_attr_expr [, model_attr_expr ...]
  [, train_attr_expr ...]

Parameter description:

Parameter

Description

model_identifier

The model type, such as DNNClassifier. For more models, see Models and algorithms.

model_attr_expr

Model attributes, such as model.n_classes = 3.

train_attr_expr

Training attributes, such as train.epoch = 10.

Example:

For example, to train a DNNClassifier model with two hidden layers, each containing ten hidden units, and with 10 training epochs:

SELECT * FROM iris.train
TO TRAIN DNNClassifier
WITH
  model.hidden_units = [10, 10],
  model.n_classes = 3,
  train.epoch = 10
...

COLUMN clause

The COLUMN clause is used to specify the field names of training features and their optional pre-processing methods.

Syntax:

COLUMN column_expr [, column_expr ...]
  | COLUMN column_expr [, column_expr ...] FOR column_name
    [COLUMN column_expr [, column_expr ...] FOR column_name ...]

Parameter description:

Parameter

Description

column_expr

The field name and pre-processing method for its content, such as sepal_length,DENSE(dense, 3).

column_name

The feature column name for model input.

Example:

For example, to use the fields sepal_length, sepal_width, petal_length, and petal_width as features:

SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
  model.hidden_units = [10, 10],
  model.n_classes = 3,
  train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
...

LABEL clause

The LABEL clause is used to indicate the field name of the training label and its optional pre-processing method.

Syntax:

LABEL label_expr

Parameter description:

Parameter

Description

label_expr

The field name and pre-processing method for its content, such as class.

Example:

SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
  model.hidden_units = [10, 10],
  model.n_classes = 3,
  train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
...

INTO clause

The INTO clause is used to specify the table name where the trained model will be saved.

Syntax:

INTO table_references

Parameter description:

Parameter

Description

table_references

The table name where the trained model will be saved.

Example:

SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
  model.hidden_units = [10, 10],
  model.n_classes = 3,
  train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO sqlflow_models.my_dnn_model;

Prediction syntax

RDS SQLFlow prediction statements consist of SELECT, PREDICT, and USING clauses.

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]
TO PREDICT result_table_reference
[WITH
  attr_expr [, attr_expr ...]]
USING model_table_reference;
Important

The data structure of the SELECT clause before TO PREDICT must be consistent with the data structure used in the TO TRAIN operation in the training syntax to ensure that the data structure used in the prediction and explanation processes matches the data structure used during the training phase.

PREDICT and USING clauses

PREDICT is used to describe the table where prediction results will be written, USING is used to describe the table from which the model is loaded, and WITH is used to describe the necessary configurations for prediction.

Syntax:

TO PREDICT result_table_reference
[WITH
  attr_expr [, attr_expr ...]]
USING model_table_reference;

Parameter description:

Parameter

Description

result_table_reference

The table that stores prediction results.

attr_expr

Configuration attributes, such as predict.batch_size = 1.

model_table_reference

The table from which the model is loaded for prediction.

Example:

For example, to use a model stored in the sqlflow.my_dnn_model table and save the prediction results to a column in the iris.predict table:

SELECT * FROM iris.train
TO PREDICT iris.predict.class
USING sqlflow.my_dnn_model;

Explanation syntax

RDS SQLFlow explanation statements consist of a series of SELECT, EXPLAIN, and USING statements.

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]
TO EXPLAIN model_table_reference
[WITH
  attr_expr [, attr_expr ...]]
[USING explainer];
Important

The data structure of the SELECT clause that precedes TO EXPLAIN must be consistent with the data structure used by the TO TRAIN operation in the training syntax.

EXPLAIN and USING clauses

The EXPLAIN clause is used to specify the table from which the model is loaded and the necessary configuration attributes. The USING clause is used to specify the explainer.

Syntax:

TO EXPLAIN model_table_reference
[WITH
  attr_expr [, attr_expr ...]]
USING explainer;

Parameter description:

Parameter

Description

model_table_reference

The table from which the model is loaded for prediction.

attr_expr

Configuration attributes, such as summary.plot_type="bar".

explainer

The type of explainer, such as TreeExplainer.

Example:

For example, to use TreeExplainer to explain a model stored in sqlflow_models.my_xgb_regression_model and plot the analysis results in sorted order:

SELECT * FROM boston.train
TO EXPLAIN sqlflow_models.my_xgb_regression_model
WITH
    summary.sort=True
USING TreeExplainer;

Evaluation syntax

RDS SQLFlow evaluation statements consist of SELECT, EVALUATE, and INTO statements.

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]
TO EVALUATE model_table_reference
[WITH
  attr_expr [, attr_expr ...]]
LABEL class
INTO evaluate_result_table;
Important
  • The data structure of the SELECT clause before TO EVALUATE must be consistent with the data structure used in the Training syntax TO TRAIN operation.

  • According to machine learning theory, it is inappropriate to use the training dataset for evaluation, as this might result in an overly idealized evaluation result.

EVALUATE and INTO clauses

The TO EVALUATE clause is used to specify the model to be evaluated and which metrics will be used for evaluation. The INTO clause is used to specify the table where evaluation results will be saved.

Syntax:

TO EVALUATE model_table_reference
[WITH
  attr_expr [, attr_expr ...]]
INTO evaluate_result_table;

Parameter description:

Parameter

Description

model_table_reference

The model to be evaluated, whose evaluation results represent the model's performance when predicting new data.

attr_expr

Attributes for evaluation.

You can specify which metrics will be output to the result table by setting validation.metrics, such as validation.metrics="Accuracy,AUC".

evaluate_result_table

The table that stores evaluation results.

Example:

SELECT * FROM iris.test 
TO EVALUATE sqlflow_models.my_dnn_model 
WITH validation.metrics = Accuracy 
LABEL class  
INTO sqlflow_models.evaluate_result_table;

Optimization syntax

RDS SQLFlow uses TO MAXIMIZE and TO MINIMIZE clauses to describe and solve mathematical programming problems.

Syntax:

SELECT select_expr [, select_expr ...]
FROM table_references
  [WHERE where_condition]
  [LIMIT row_count]
TO MAXIMIZE|MINIMIZE
  objective_expr
CONSTRAINT
  constraint_expr [GROUP BY column_name]
  [, constraint_expr [GROUP BY column_name]...]
WITH
  variables="variable_value(column_name, ...)"
  var_type="Integers|Reals|Binary|NonNegativeIntegers|..."
[USING solver_name]
INTO result_table_name;

Example:

SELECT c1, c2, c3 FROM my_db.my_table
TO MAXIMIZE SUM(x * c1)
CONSTRAINT
    SUM(x) <= c2 GROUP BY c1
    x <= 3
WITH
    variables="x(c3)",
    var_type="NonNegativeIntegers"
USING glpk
INTO my_db.my_result_table;

Parameter description:

Parameter

Description

TO MAXIMIZE

objective_expr is SUM(x * c1), indicating that the value of SUM(x * c1) should be maximized.

CONSTRAINT

  • SUM(x) <= c2 GROUP BY c1: GROUP BY means that for each different c1, there will be a constraint expression SUM(x) <= c2.

  • x <= 3: For each x, the condition x <= 3 must be satisfied.

WITH

  • variables=x(c3): Indicates that the optimization target column is c3; x represents the variable value to be solved.

  • var_type="NonNegativeIntegers": Indicates that the variable x is a non-negative integer. The supported data types for variables are as follows:

    • Binary: The variable value is 0 or 1.

    • Integers: The variable value is an integer.

    • PositiveIntegers, NegativeIntegers: The variable value is a positive or negative integer.

    • NonPositiveIntegers, NonNegativeIntegers: The variable value is a non-positive or non-negative integer.

    • Reals: The variable value is a real number.

    • PositiveReals, NegativeReals: The variable value is a positive or negative real number.

    • NonPositiveReals, NonNegativeReals: The variable value is a non-positive or non-negative real number.

USING

glpk: Indicates using glpk to solve this problem. RDS SQLFlow supports:

  • GLPK: Used for solving linear programming problems.

  • BARON: Used for solving non-linear programming problems.

INTO

my_db.my_result_table: The table that stores the solution results.

References

Introduction to RDS Custom