All Products
Search
Document Center

PolarDB:NL2SQL: Natural language querying

最終更新日:Mar 28, 2026

Querying a database normally requires SQL expertise. LLM-based NL2SQL removes that barrier: the built-in large language model (LLM) translates plain-language questions into SQL statements, so analysts and non-technical users can query PolarDB directly. The model handles date arithmetic, value mappings (such as valid->isValid=1), and domain-specific patterns after calibration.

Quick demo

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_nl2sql, SELECT 'Filter for the 2 students with the most absences, sort them by the number of absences in descending order, and display their names and number of absences.') WITH (basic_index_name='schema_index');

Output:

SELECT s.student_name, COUNT(sc.id) AS leave_count FROM students s JOIN student_courses sc ON s.id = sc.student_id WHERE sc.status = 0 GROUP BY s.student_name ORDER BY leave_count DESC LIMIT 2;

How it works

Implementing NL2SQL follows three phases:

PhaseGoalKey tasks
Phase 1: Quick launchGet NL2SQL working on your database from scratchAdd table and column comments, build the search index, run your first query
Phase 2: Optimization and fine-tuningImprove accuracy for your specific business dataAdd question templates, configure pre- and post-processing rules, customize comments
Phase 3: Production deploymentRun NL2SQL reliably in productionSmooth deployment of the NL2SQL system into the production environment

The rest of this document walks through each phase in sequence.

Prerequisites

Before you begin, make sure you have:

Note If you added an AI node when purchasing the cluster, you can skip to setting the database account permissions.
Important

When connecting from the command line, include the -c option. If you use Data Management (DMS), DMS connects via the Primary Endpoint by default, which routes queries away from the AI node. Change the connection to the Cluster Endpoint before running NL2SQL queries.

Usage notes

How to phrase questions

Structure your questions with the condition first, followed by the entity (column value), then the column name. This order improves the accuracy of the generated SQL.

Example:

SELECT 'What is the property name of a "house" or "apartment" with more than one room?'

Here, *with more than one room* is the condition, *house* and *apartment* are the column values, and *property name* is the column name.

Factors that affect accuracy

FactorEffect
Richness of table and column commentsMore descriptive comments improve how well the model understands your schema
Semantic match between your question and column commentsThe closer the match, the better the generated SQL
Number of columns in the generated SQLFewer columns and simpler conditions produce more accurate results
SQL logical complexityLess advanced syntax produces more accurate results

Phase 1: Quick launch

Step 1: Add comments to your data tables

NL2SQL relies on table and column comments to understand your schema. Before running any queries, add comments to the tables you plan to query.

Table comments help the model identify which tables to use. Keep them concise (within 10 characters) and focused on the table's main content — for example, *orders* or *inventory*.

Column comments should use common nouns or phrases that accurately reflect the column's meaning — for example, *order ID*, *date*, or *store name*. You can also include value mappings, such as: Indicates whether it is valid. 0: No. 1: Yes.

Note If you cannot modify the original comments, use the Customize table and column comments method in Phase 2 to add overriding comments without touching the originals.

Step 2: Build a search index table

The search index table stores vectorized representations of your schema, which the model uses to select relevant tables for each query.

You can use any table name that follows database naming rules. This guide uses schema_index as an example.

/*polar4ai*/CREATE TABLE schema_index(id integer, table_name varchar, table_comment text_ik_max_word, table_ddl text_ik_max_word, column_names text_ik_max_word, column_comments text_ik_max_word, sample_values text_ik_max_word, vecs vector_768, ext text_ik_max_word, PRIMARY KEY (id));
Note The search index table is not visible in the standard database view. To list all PolarDB for AI tables, run /*polar4ai*/SHOW TABLES;. To delete the table, run /*polar4ai*/DROP TABLE IF EXISTS schema_index;.

Step 3: Import your schema into the search index

This step vectorizes all tables in the current database and imports the schema information into the search index table.

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_text2vec, SELECT '') WITH (mode='async', resource='schema') INTO schema_index;

Parameters

ParameterRequiredDescription
modeYesData writing mode. Fixed to async for asynchronous mode.
resourceYesResource type. Fixed to schema to vectorize table information.
tables_includedNoTables to vectorize. Default: '' (all tables). To specify multiple tables, separate names with commas.
to_sampleNoWhether to sample column values. 0 (default): no sampling. 1: sample column values. Sampling improves SQL quality for tables with fewer than 15 columns but increases import time.
columns_excludedNoColumns to exclude from NL2SQL. Default: '' (none excluded). Format: table_name1.column_name1,table_name2.column_name1.

Example: Vectorize specific tables and sample column values, excluding certain columns:

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_text2vec, SELECT '') WITH (mode='async', resource='schema', tables_included='graph_info,image_info,text_info', to_sample=1, columns_excluded='graph_info.time,text_info.ext') INTO schema_index;

Step 4: Check the import status

After running the import, a task_id is returned (for example, bce632ea-97e9-11ee-bdd2-492f4dfe0918). Check the status using:

/*polar4ai*/SHOW TASK `bce632ea-97e9-11ee-bdd2-492f4dfe0918`;

When the return value is finish, the import is complete. To view the indexed schema information:

/*polar4ai*/SELECT * FROM schema_index;

Step 5: Run NL2SQL queries

Syntax

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_nl2sql, SELECT '<question>') WITH (basic_index_name='<basic_index_name>');

Replace <question> with your natural language question and <basic_index_name> with the name of your search index table.

Parameters

ParameterRequiredDefaultDescription
basic_index_nameYesName of the search index table.
to_optimizeNo0Whether to optimize the generated SQL. 0: no optimization. 1: apply SQL optimization.
basic_index_topNo3Number of most similar tables to retrieve. Valid values: [1,10]. Increase to 4 or higher when a query spans multiple tables.
basic_index_thresholdNo0.1Similarity threshold for table selection. Valid values: (0,1]. Only tables with a vector match score above this threshold are selected.

Examples

Sort results and display specific columns:

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_nl2sql, SELECT 'Sort by teacher name in ascending alphabetical order, and display the teacher''s name and the name of the course they are assigned to teach.') WITH (basic_index_name='schema_index');
SELECT t.teacher_name, c.course_name FROM teachers t JOIN courses c ON t.id = c.teacher_id ORDER BY t.teacher_name ASC;

Filter, sort, and limit results:

/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_nl2sql, SELECT 'Filter for the 2 students with the most absences, sort them by the number of absences in descending order, and display their names and number of absences.') WITH (basic_index_name='schema_index');
SELECT s.student_name, COUNT(sc.id) AS leave_count FROM students s JOIN student_courses sc ON s.id = sc.student_id WHERE sc.status = 0 GROUP BY s.student_name ORDER BY leave_count DESC LIMIT 2;