You can use RDS SQLFlow statements to efficiently perform model training, prediction, and evaluation. This topic describes the syntax specification of RDS SQLFlow in detail.
Introduction
You can use RDS SQLFlow statements to efficiently perform model training, prediction, and evaluation without writing complex Python code. RDS SQLFlow automatically parses these statements and converts them into equivalent Python programs, which significantly simplifies the model training process. The syntax of RDS SQLFlow covers various features, including the following:
For example, you can use the iris dataset and train a TensorFlow DNNClassifier algorithm. In the dataset, the first four columns represent features that indicate the botanical shape of each iris flower, and the last column is the label that represents the subspecies of each iris flower.
sepal_length | sepal_width | petal_length | petal_width | class |
6.4 | 2.8 | 5.6 | 2.2 | 2 |
5.0 | 2.3 | 3.3 | 1.0 | 1 |
... | ... | ... | ... | ... |
With the following simple RDS SQLFlow statement, you can easily train a DNNClassifier model with two hidden layers, each containing ten hidden units. After training, the model will be saved to the sqlflow_models.my_dnn_model table for subsequent prediction. RDS SQLFlow automatically parses these statements and converts them into equivalent Python programs, which significantly simplifies the model training process.
SELECT * FROM iris.train
TO TRAIN DNNClassifier
WITH model.hidden_units = [10, 10], model.n_classes = 3, train.epoch= 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO sqlflow_models.my_dnn_model;Training syntax
RDS SQLFlow training statements consist of SELECT, TRAIN, COLUMN, LABEL, and INTO clauses.
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]
TO TRAIN model_identifier
[WITH
model_attr_expr [, model_attr_expr ...]
[, train_attr_expr ...]]
COLUMN column_expr [, column_expr ...]
| COLUMN column_expr [, column_expr ...] FOR column_name
[COLUMN column_expr [, column_expr ...] FOR column_name ...]
[LABEL label_expr]
INTO table_references;SELECT clause
The SELECT clause is used to retrieve data from a specified table.
Syntax:
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]Parameter description:
Parameter | Description |
select_expr | The data columns to retrieve. At least one column must be specified. |
table_references | The table from which to retrieve data. |
where_condition | An expression used to filter rows. |
row_count | The maximum number of rows to retrieve. |
Example:
For example, to quickly train a binary classification model on the sample dataset (iris.train):
SELECT * FROM iris.train
WHERE class = 0 OR class = 1
LIMIT 1000
TO TRAIN ...TRAIN clause
The TRAIN clause is used to describe a specific model type and its training algorithm.
Syntax:
TO TRAIN model_identifier
WITH
model_attr_expr [, model_attr_expr ...]
[, train_attr_expr ...]Parameter description:
Parameter | Description |
model_identifier | The model type, such as DNNClassifier. For more models, see Models and algorithms. |
model_attr_expr | Model attributes, such as |
train_attr_expr | Training attributes, such as |
Example:
For example, to train a DNNClassifier model with two hidden layers, each containing ten hidden units, and with 10 training epochs:
SELECT * FROM iris.train
TO TRAIN DNNClassifier
WITH
model.hidden_units = [10, 10],
model.n_classes = 3,
train.epoch = 10
...COLUMN clause
The COLUMN clause is used to specify the field names of training features and their optional pre-processing methods.
Syntax:
COLUMN column_expr [, column_expr ...]
| COLUMN column_expr [, column_expr ...] FOR column_name
[COLUMN column_expr [, column_expr ...] FOR column_name ...]Parameter description:
Parameter | Description |
column_expr | The field name and pre-processing method for its content, such as |
column_name | The feature column name for model input. |
Example:
For example, to use the fields sepal_length, sepal_width, petal_length, and petal_width as features:
SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
model.hidden_units = [10, 10],
model.n_classes = 3,
train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
...LABEL clause
The LABEL clause is used to indicate the field name of the training label and its optional pre-processing method.
Syntax:
LABEL label_exprParameter description:
Parameter | Description |
label_expr | The field name and pre-processing method for its content, such as |
Example:
SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
model.hidden_units = [10, 10],
model.n_classes = 3,
train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
...INTO clause
The INTO clause is used to specify the table name where the trained model will be saved.
Syntax:
INTO table_referencesParameter description:
Parameter | Description |
table_references | The table name where the trained model will be saved. |
Example:
SELECT * FROM iris.train
TO TRAIN DNNClassifier WITH
model.hidden_units = [10, 10],
model.n_classes = 3,
train.epoch = 10
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO sqlflow_models.my_dnn_model;Prediction syntax
RDS SQLFlow prediction statements consist of SELECT, PREDICT, and USING clauses.
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]
TO PREDICT result_table_reference
[WITH
attr_expr [, attr_expr ...]]
USING model_table_reference;The data structure of the SELECT clause before TO PREDICT must be consistent with the data structure used in the TO TRAIN operation in the training syntax to ensure that the data structure used in the prediction and explanation processes matches the data structure used during the training phase.
PREDICT and USING clauses
PREDICT is used to describe the table where prediction results will be written, USING is used to describe the table from which the model is loaded, and WITH is used to describe the necessary configurations for prediction.
Syntax:
TO PREDICT result_table_reference
[WITH
attr_expr [, attr_expr ...]]
USING model_table_reference;Parameter description:
Parameter | Description |
result_table_reference | The table that stores prediction results. |
attr_expr | Configuration attributes, such as |
model_table_reference | The table from which the model is loaded for prediction. |
Example:
For example, to use a model stored in the sqlflow.my_dnn_model table and save the prediction results to a column in the iris.predict table:
SELECT * FROM iris.train
TO PREDICT iris.predict.class
USING sqlflow.my_dnn_model;Explanation syntax
RDS SQLFlow explanation statements consist of a series of SELECT, EXPLAIN, and USING statements.
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]
TO EXPLAIN model_table_reference
[WITH
attr_expr [, attr_expr ...]]
[USING explainer];The data structure of the SELECT clause that precedes TO EXPLAIN must be consistent with the data structure used by the TO TRAIN operation in the training syntax.
EXPLAIN and USING clauses
The EXPLAIN clause is used to specify the table from which the model is loaded and the necessary configuration attributes. The USING clause is used to specify the explainer.
Syntax:
TO EXPLAIN model_table_reference
[WITH
attr_expr [, attr_expr ...]]
USING explainer;Parameter description:
Parameter | Description |
model_table_reference | The table from which the model is loaded for prediction. |
attr_expr | Configuration attributes, such as |
explainer | The type of explainer, such as |
Example:
For example, to use TreeExplainer to explain a model stored in sqlflow_models.my_xgb_regression_model and plot the analysis results in sorted order:
SELECT * FROM boston.train
TO EXPLAIN sqlflow_models.my_xgb_regression_model
WITH
summary.sort=True
USING TreeExplainer;Evaluation syntax
RDS SQLFlow evaluation statements consist of SELECT, EVALUATE, and INTO statements.
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]
TO EVALUATE model_table_reference
[WITH
attr_expr [, attr_expr ...]]
LABEL class
INTO evaluate_result_table;The data structure of the
SELECTclause beforeTO EVALUATEmust be consistent with the data structure used in the Training syntaxTO TRAINoperation.According to machine learning theory, it is inappropriate to use the training dataset for evaluation, as this might result in an overly idealized evaluation result.
EVALUATE and INTO clauses
The TO EVALUATE clause is used to specify the model to be evaluated and which metrics will be used for evaluation. The INTO clause is used to specify the table where evaluation results will be saved.
Syntax:
TO EVALUATE model_table_reference
[WITH
attr_expr [, attr_expr ...]]
INTO evaluate_result_table;Parameter description:
Parameter | Description |
model_table_reference | The model to be evaluated, whose evaluation results represent the model's performance when predicting new data. |
attr_expr | Attributes for evaluation. You can specify which metrics will be output to the result table by setting |
evaluate_result_table | The table that stores evaluation results. |
Example:
SELECT * FROM iris.test
TO EVALUATE sqlflow_models.my_dnn_model
WITH validation.metrics = Accuracy
LABEL class
INTO sqlflow_models.evaluate_result_table;Optimization syntax
RDS SQLFlow uses TO MAXIMIZE and TO MINIMIZE clauses to describe and solve mathematical programming problems.
Syntax:
SELECT select_expr [, select_expr ...]
FROM table_references
[WHERE where_condition]
[LIMIT row_count]
TO MAXIMIZE|MINIMIZE
objective_expr
CONSTRAINT
constraint_expr [GROUP BY column_name]
[, constraint_expr [GROUP BY column_name]...]
WITH
variables="variable_value(column_name, ...)"
var_type="Integers|Reals|Binary|NonNegativeIntegers|..."
[USING solver_name]
INTO result_table_name;Example:
SELECT c1, c2, c3 FROM my_db.my_table
TO MAXIMIZE SUM(x * c1)
CONSTRAINT
SUM(x) <= c2 GROUP BY c1
x <= 3
WITH
variables="x(c3)",
var_type="NonNegativeIntegers"
USING glpk
INTO my_db.my_result_table;
Parameter description:
Parameter | Description |
TO MAXIMIZE |
|
CONSTRAINT |
|
WITH |
|
USING |
|
INTO |
|