All Products
Search
Document Center

AnalyticDB:In-Database AI/ML overview

Last Updated:Jun 18, 2025

AnalyticDB PostgreSQL 7.0 supports In-Database AI/ML functionality, enabling data processing and model computation directly within the database, which significantly reduces data forwarding costs. This feature is implemented through the pgml extension, which is compatible with PostgresML open source community interfaces and has been deeply optimized for performance, features, and usability. It supports model training, fine-tuning, deployment, and inference with GPU/CPU acceleration. The extension integrates mainstream machine learning algorithms such as XGBoost, LightGBM, and SciKit-Learn, helping enterprises build intelligent analysis applications efficiently.

Prerequisites

  • An AnalyticDB PostgreSQL 7.0 instance with kernel version V7.1.1.0 or later.

    Note

    You can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.

  • An instance in elastic storage mode is created.

  • The pgml extension is installed on the instance.

    Note
    • The pgml extension does not support installation through the console. If you need to install it, please submit a ticket to contact technical support for assistance. To uninstall the pgml extension, submit a ticket.

    • The pgml extension is not supported on AnalyticDB for PostgreSQL 7.0 Basic Edition.

Introduction to metadata

The In-Database AI/ML framework in AnalyticDB PostgreSQL 7.0 is implemented based on the pgml extension. After you install the pgml extension for an AnalyticDB for PostgreSQL instance, the system automatically creates a schema named pgml. The following table describes the metadata tables contained in the pgml schema.

Metadata table name

Description

projects

Records information about projects in training tasks.

models

Records information about model training.

files

Records information about stored model files.

snapshots

Records information about the snapshots of training datasets.

logs

Records information about training logs.

deployments

Records information about model deployment.

When a model training task is initiated, training information is automatically written to these metadata tables.

Note

For information about custom types in the pgml metadata tables (such as task, runtime, and sampling), see Machine learning.

projects

The projects table records the project ID, project name, task type, creation time, and update time of training tasks. Table schema and indexes:

                                         Table "pgml.projects"
   Column   |            Type             | Collation | Nullable |               Default                
------------+-----------------------------+-----------+----------+--------------------------------------
 id         | bigint                      |           | not null | nextval('projects_id_seq'::regclass)
 name       | text                        |           | not null | 
 task       | task                        |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "projects_pkey" PRIMARY KEY, btree (id)
    "projects_name_idx" btree (name)
Triggers:
    projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicated

models

The models table records the parameters specified for model training and other associated information, such as project ID and snapshot ID. Table schema and indexes:

                                           Table "pgml.models"
    Column     |            Type             | Collation | Nullable |              Default               
---------------+-----------------------------+-----------+----------+------------------------------------
 id            | bigint                      |           | not null | nextval('models_id_seq'::regclass)
 project_id    | bigint                      |           | not null | 
 snapshot_id   | bigint                      |           |          | 
 num_features  | integer                     |           | not null | 
 algorithm     | text                        |           | not null | 
 runtime       | runtime                     |           |          | 'python'::runtime
 hyperparams   | jsonb                       |           | not null | 
 status        | text                        |           | not null | 
 metrics       | jsonb                       |           |          | 
 search        | text                        |           |          | 
 search_params | jsonb                       |           | not null | 
 search_args   | jsonb                       |           | not null | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "models_pkey" PRIMARY KEY, btree (id)
    "models_project_id_idx" btree (project_id)
    "models_snapshot_id_idx" btree (snapshot_id)
Triggers:
    models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicated

files

The files table records post-training model files in the data column in binary format. Each file is divided into multiple splits of 100 MB each. Table schema and indexes:

                                         Table "pgml.files"
   Column   |            Type             | Collation | Nullable |              Default              
------------+-----------------------------+-----------+----------+-----------------------------------
 id         | bigint                      |           | not null | nextval('files_id_seq'::regclass)
 model_id   | bigint                      |           | not null | 
 path       | text                        |           | not null | 
 part       | integer                     |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
 data       | bytea                       |           | not null | 
Indexes:
    "files_pkey" PRIMARY KEY, btree (id)
    "files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
    files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicated

snapshots

The snapshots table records information about the snapshots of training datasets, such as the table name and test set division information. Table schema and indexes:

                                           Table "pgml.snapshots"
    Column     |            Type             | Collation | Nullable |                Default                
---------------+-----------------------------+-----------+----------+---------------------------------------
 id            | bigint                      |           | not null | nextval('snapshots_id_seq'::regclass)
 relation_name | text                        |           | not null | 
 y_column_name | text[]                      |           |          | 
 test_size     | real                        |           | not null | 
 test_sampling | sampling                    |           | not null | 
 status        | text                        |           | not null | 
 columns       | jsonb                       |           |          | 
 analysis      | jsonb                       |           |          | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
 materialized  | boolean                     |           |          | false
Indexes:
    "snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
    snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicated

logs

The logs table records information about training logs. A training task may contain multiple training logs. You can view the logs based on the values of the created_at column in ascending order. Table schema and indexes:

                                         Table "pgml.logs"
   Column   |            Type             | Collation | Nullable |             Default              
------------+-----------------------------+-----------+----------+----------------------------------
 id         | integer                     |           | not null | nextval('logs_id_seq'::regclass)
 model_id   | bigint                      |           |          | 
 project_id | bigint                      |           |          | 
 created_at | timestamp without time zone |           |          | CURRENT_TIMESTAMP
 logs       | jsonb                       |           |          | 
Indexes:
    "logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicated

deployments

The deployments table records information about model deployment, such as the project ID, deployment ID, and model ID. Table schema and indexes:

                                         Table "pgml.deployments"
   Column   |            Type             | Collation | Nullable |                 Default                 
------------+-----------------------------+-----------+----------+-----------------------------------------
 id         | bigint                      |           | not null | nextval('deployments_id_seq'::regclass)
 project_id | bigint                      |           | not null | 
 model_id   | bigint                      |           | not null | 
 strategy   | strategy                    |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "deployments_pkey" PRIMARY KEY, btree (id)
    "deployments_model_id_created_at_idx" btree (model_id)
    "deployments_project_id_created_at_idx" btree (project_id)
Triggers:
    deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated