AnalyticDB PostgreSQL 7.0 supports In-Database AI/ML functionality, enabling data processing and model computation directly within the database, which significantly reduces data forwarding costs. This feature is implemented through the pgml extension, which is compatible with PostgresML open source community interfaces and has been deeply optimized for performance, features, and usability. It supports model training, fine-tuning, deployment, and inference with GPU/CPU acceleration. The extension integrates mainstream machine learning algorithms such as XGBoost, LightGBM, and SciKit-Learn, helping enterprises build intelligent analysis applications efficiently.
Prerequisites
An AnalyticDB PostgreSQL 7.0 instance with kernel version V7.1.1.0 or later.
NoteYou can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.
An instance in elastic storage mode is created.
The pgml extension is installed on the instance.
NoteThe pgml extension does not support installation through the console. If you need to install it, please submit a ticket to contact technical support for assistance. To uninstall the pgml extension, submit a ticket.
The pgml extension is not supported on AnalyticDB for PostgreSQL 7.0 Basic Edition.
Introduction to metadata
The In-Database AI/ML framework in AnalyticDB PostgreSQL 7.0 is implemented based on the pgml extension. After you install the pgml extension for an AnalyticDB for PostgreSQL instance, the system automatically creates a schema named pgml. The following table describes the metadata tables contained in the pgml schema.
Metadata table name | Description |
projects | Records information about projects in training tasks. |
models | Records information about model training. |
files | Records information about stored model files. |
snapshots | Records information about the snapshots of training datasets. |
logs | Records information about training logs. |
deployments | Records information about model deployment. |
When a model training task is initiated, training information is automatically written to these metadata tables.
For information about custom types in the pgml metadata tables (such as task, runtime, and sampling), see Machine learning.
projects
The projects table records the project ID, project name, task type, creation time, and update time of training tasks. Table schema and indexes:
Table "pgml.projects"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------------------------
id | bigint | | not null | nextval('projects_id_seq'::regclass)
name | text | | not null |
task | task | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"projects_name_idx" btree (name)
Triggers:
projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicatedmodels
The models table records the parameters specified for model training and other associated information, such as project ID and snapshot ID. Table schema and indexes:
Table "pgml.models"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('models_id_seq'::regclass)
project_id | bigint | | not null |
snapshot_id | bigint | | |
num_features | integer | | not null |
algorithm | text | | not null |
runtime | runtime | | | 'python'::runtime
hyperparams | jsonb | | not null |
status | text | | not null |
metrics | jsonb | | |
search | text | | |
search_params | jsonb | | not null |
search_args | jsonb | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"models_pkey" PRIMARY KEY, btree (id)
"models_project_id_idx" btree (project_id)
"models_snapshot_id_idx" btree (snapshot_id)
Triggers:
models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicatedfiles
The files table records post-training model files in the data column in binary format. Each file is divided into multiple splits of 100 MB each. Table schema and indexes:
Table "pgml.files"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------
id | bigint | | not null | nextval('files_id_seq'::regclass)
model_id | bigint | | not null |
path | text | | not null |
part | integer | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
data | bytea | | not null |
Indexes:
"files_pkey" PRIMARY KEY, btree (id)
"files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicatedsnapshots
The snapshots table records information about the snapshots of training datasets, such as the table name and test set division information. Table schema and indexes:
Table "pgml.snapshots"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------------------------------------
id | bigint | | not null | nextval('snapshots_id_seq'::regclass)
relation_name | text | | not null |
y_column_name | text[] | | |
test_size | real | | not null |
test_sampling | sampling | | not null |
status | text | | not null |
columns | jsonb | | |
analysis | jsonb | | |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
materialized | boolean | | | false
Indexes:
"snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicatedlogs
The logs table records information about training logs. A training task may contain multiple training logs. You can view the logs based on the values of the created_at column in ascending order. Table schema and indexes:
Table "pgml.logs"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+----------------------------------
id | integer | | not null | nextval('logs_id_seq'::regclass)
model_id | bigint | | |
project_id | bigint | | |
created_at | timestamp without time zone | | | CURRENT_TIMESTAMP
logs | jsonb | | |
Indexes:
"logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicateddeployments
The deployments table records information about model deployment, such as the project ID, deployment ID, and model ID. Table schema and indexes:
Table "pgml.deployments"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------------
id | bigint | | not null | nextval('deployments_id_seq'::regclass)
project_id | bigint | | not null |
model_id | bigint | | not null |
strategy | strategy | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"deployments_pkey" PRIMARY KEY, btree (id)
"deployments_model_id_created_at_idx" btree (model_id)
"deployments_project_id_created_at_idx" btree (project_id)
Triggers:
deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated