All Products
Search
Document Center

AnalyticDB:Best practices for Semantic View to improve NL2SQL accuracy

Last Updated:May 09, 2026

A Semantic View decouples business semantics from the physical data model in AnalyticDB for MySQL, providing structured semantic context that improves the accuracy of LLM-generated SQL in NL2SQL workflows.

Important

Semantic View is in private preview. To request access, submit a ticket or contact technical support.

Comparison

In traditional NL2SQL workflows, an LLM relies solely on table schemas to generate SQL. With multi-table joins and complex calculations, this approach is error-prone. For example, when asking an agent to "query net revenue by country in Asia for 1997" against a TPC-H dataset, common errors include:

  • Wrong JOIN paths: joining unrelated supply chain tables (partsupp) or querying the wrong database.

  • Incorrect calculations: subtracting supply cost (ps_supplycost) when the metric requires revenue only.

  • High token cost: full schemas for 5+ tables must be provided, often requiring multiple correction rounds.

Semantic View solves this by predefining dimensions, metrics, and relationships. The LLM reads these definitions and generates correct SQL in a single round.

Metric

Without Semantic View

With Semantic View

Accuracy

Incorrect JOIN paths and calculation logic

Correct SQL in one shot

Conversation rounds

3 to 5 (multiple corrections)

1

Context required

Full table schemas (5+ tables)

Semantic summary only

Reusability

Must re-explain context each time

Define once, reuse permanently

Prerequisites

  • An AnalyticDB for MySQL cluster is created. For more information, see Create a cluster.

  • Semantic View private preview access is enabled. Submit a ticket or contact technical support to request access.

  • An AI coding tool that supports Skills is available, such as Claude Code.

Procedure

Step 1: Prepare data and environment

  1. Create a database and TPC-H tables.

    CREATE DATABASE IF NOT EXISTS tpch_demo;
    USE tpch_demo;
    -- Create the orders table
    CREATE TABLE IF NOT EXISTS orders (
      o_orderkey      BIGINT NOT NULL,
      o_custkey       BIGINT NOT NULL,
      o_orderstatus   VARCHAR(1),
      o_totalprice    DECIMAL(15,2),
      o_orderdate     DATE,
      o_orderpriority VARCHAR(15),
      o_clerk         VARCHAR(15),
      o_shippriority  INT,
      o_comment       VARCHAR(79),
      PRIMARY KEY (o_orderkey)
    );
    
    -- Create the lineitem table
    CREATE TABLE IF NOT EXISTS lineitem (
      l_orderkey      BIGINT NOT NULL,
      l_partkey       BIGINT NOT NULL,
      l_suppkey       BIGINT NOT NULL,
      l_linenumber    INT NOT NULL,
      l_quantity      DECIMAL(15,2),
      l_extendedprice DECIMAL(15,2),
      l_discount      DECIMAL(15,2),
      l_tax           DECIMAL(15,2),
      l_returnflag    VARCHAR(1),
      l_linestatus    VARCHAR(1),
      l_shipdate      DATE,
      l_commitdate    DATE,
      l_receiptdate   DATE,
      l_shipinstruct  VARCHAR(25),
      l_shipmode      VARCHAR(10),
      l_comment       VARCHAR(44),
      PRIMARY KEY (l_orderkey, l_linenumber)
    );
    
    -- Create the customer table
    CREATE TABLE IF NOT EXISTS customer (
      c_custkey    BIGINT NOT NULL,
      c_name       VARCHAR(25),
      c_address    VARCHAR(40),
      c_nationkey  BIGINT,
      c_phone      VARCHAR(15),
      c_acctbal    DECIMAL(15,2),
      c_mktsegment VARCHAR(10),
      c_comment    VARCHAR(117),
      PRIMARY KEY (c_custkey)
    );
  2. Load TPC-H sample data. Generate data with the dbgen tool or download a dataset from the TPC-H website.

    Note

    If you already have a database with TPC-H data, skip the table creation and replace tpch_demo with your actual database name in the following examples.

  3. Set database connection environment variables in your terminal.

    export ADB_MYSQL_HOST="amv-xxxxx.ads.aliyuncs.com"
    export ADB_MYSQL_PORT="3306"
    export ADB_MYSQL_USER="your_username"
    export ADB_MYSQL_PASSWORD="your_password"

Step 2: Install the Skill

Install the AnalyticDB for MySQL Smart Analyst Skill to manage Semantic Views and run NL2SQL queries through natural language.

  1. Run the following command to install the Skill:

    npx skills add aliyun/alibabacloud-adb-mysql-mcp-server

    Alternatively, install manually:

    git clone https://github.com/aliyun/alibabacloud-adb-mysql-mcp-server
    cd alibabacloud-adb-mysql-mcp-server/skill
    mkdir -p ~/.claude/skills/
    cp -r alibabacloud-adb-smart-analyst ~/.claude/skills/
  2. Verify database connectivity.

    /alibabacloud-adb-smart-analyst connectivity test

    Expected output on success:

    Connectivity test passed
    
      SQL executed: SELECT 1 AS connectivity_test
    
      ┌───────────────────┐
      │ connectivity_test │
      ├───────────────────┤
      │ 1                 │
      └───────────────────┘
    
      Database connection is active.

    If the connection fails, the Skill prompts you to set environment variables. Refer to Step 1.

Step 3: Create and manage a Semantic View

A Semantic View uses YAML to define business semantics, embedded in a CREATE OR REPLACE SEMANTIC VIEW statement. The YAML contains the following elements:

Element

Description

dimensions

Fields used in WHERE and GROUP BY clauses, such as status or region

time_dimensions

Date fields used for time-range filtering and aggregation

facts

Raw numeric fields (price, discount) that serve as the basis for metric calculations

metrics

Aggregate expressions based on facts (total revenue, average order value) representing business KPIs

filters

Named filter conditions that encapsulate common business filtering logic

relationships

Table join paths that enable correct multi-table queries

Create a sales analytics Semantic View based on the lineitem, orders, and customer tables:

CREATE OR REPLACE SEMANTIC VIEW tpch_sales_analytics
LANGUAGE YAML
AS
$$
name: tpch_sales_analytics
description: Sales analytics semantic view for TPC-H. Supports analysis by customer, order, and time dimensions.

tables:
  - name: lineitem
    base_table:
      schema: tpch_demo
      table: lineitem
    dimensions:
      - name: return_flag
        description: Return status. R = returned, A = accepted, N = not processed.
        expr: l_returnflag
        data_type: varchar
      - name: line_status
        description: Line item status. O = open, F = fulfilled.
        expr: l_linestatus
        data_type: varchar
      - name: ship_mode
        description: Shipping method. Values include TRUCK, AIR, SHIP, RAIL, and MAIL.
        expr: l_shipmode
        data_type: varchar
    time_dimensions:
      - name: ship_date
        description: Date the item was shipped
        expr: l_shipdate
        data_type: date
    facts:
      - name: extended_price
        description: Original item price before discount
        expr: l_extendedprice
        data_type: decimal
      - name: discount
        description: Discount rate, ranging from 0 to 1
        expr: l_discount
        data_type: decimal
      - name: quantity
        description: Number of items
        expr: l_quantity
        data_type: decimal
    metrics:
      - name: revenue
        description: Total revenue after discount, calculated as SUM(price * (1 - discount))
        expr: SUM(l_extendedprice * (1 - l_discount))
      - name: total_quantity
        description: Total number of items
        expr: SUM(l_quantity)
      - name: avg_price
        description: Average item price
        expr: AVG(l_extendedprice)
      - name: late_shipping_rate
        description: Percentage of items received after the committed date
        expr: COUNT(CASE WHEN l_receiptdate > l_commitdate THEN 1 END) * 1.0 / COUNT(*)
    filters:
      - name: completed_only
        description: Include only fulfilled line items
        expr: l_linestatus = 'F'

  - name: orders
    base_table:
      schema: tpch_demo
      table: orders
    dimensions:
      - name: order_status
        description: Order status. O = open, F = fulfilled, P = partially fulfilled.
        expr: o_orderstatus
        data_type: varchar
      - name: order_priority
        description: Order priority. Values are 1-URGENT, 2-HIGH, 3-MEDIUM, 4-NOT SPECIFIED, and 5-LOW.
        expr: o_orderpriority
        data_type: varchar
    time_dimensions:
      - name: order_date
        description: Date the order was placed
        expr: o_orderdate
        data_type: date
    facts:
      - name: total_price
        description: Total order amount
        expr: o_totalprice
        data_type: decimal
    metrics:
      - name: order_count
        description: Total number of orders
        expr: COUNT(*)
      - name: avg_order_value
        description: Average order amount
        expr: AVG(o_totalprice)

  - name: customer
    base_table:
      schema: tpch_demo
      table: customer
    dimensions:
      - name: market_segment
        description: Customer market segment. Values are AUTOMOBILE, BUILDING, FURNITURE, HOUSEHOLD, and MACHINERY.
        expr: c_mktsegment
        data_type: varchar
      - name: customer_name
        description: Customer name
        expr: c_name
        data_type: varchar
    facts:
      - name: account_balance
        description: Customer account balance
        expr: c_acctbal
        data_type: decimal

relationships:
  - name: lineitem_to_orders
    left_table: lineitem
    right_table: orders
    relationship_type: many_to_one
    on:
      - left_expr: l_orderkey
        right_expr: o_orderkey
  - name: orders_to_customer
    left_table: orders
    right_table: customer
    relationship_type: many_to_one
    on:
      - left_expr: o_custkey
        right_expr: c_custkey
$$;

Verify the Semantic View:

SELECT view_schema, view_name, definition
FROM information_schema.semantic_views
WHERE view_name = 'tpch_sales_analytics';

You can also manage Semantic Views through natural language with the Skill:

-- Query a specific semantic view
/alibabacloud-adb-smart-analyst show the tpch_sales_analytics semantic view

-- List all semantic views
/alibabacloud-adb-smart-analyst list all semantic views

-- Delete a semantic view
/alibabacloud-adb-smart-analyst delete tpch_demo.tpch_sales_analytics

Step 4: Run NL2SQL queries

With the Semantic View created, use the Skill to query data in natural language. The Skill automatically performs semantic search, schema validation, and SQL execution.

Example:

/alibabacloud-adb-smart-analyst total revenue by shipping method

Result:

Query results

  SQL executed:
  SELECT l_shipmode, SUM(l_extendedprice * (1 - l_discount)) AS revenue
  FROM tpch_demo.lineitem
  GROUP BY l_shipmode

  ┌───────────┬─────────────┐
  │ l_shipmode│   revenue   │
  ├───────────┼─────────────┤
  │ AIR       │ 258,217,431 │
  │ FOB       │ 261,609,629 │
  │ MAIL      │ 259,103,165 │
  │ RAIL      │ 260,370,880 │
  │ REG AIR   │ 260,012,778 │
  │ SHIP      │ 261,260,339 │
  │ TRUCK     │ 261,419,202 │
  └───────────┴─────────────┘

More query examples:

Natural language query

Generated SQL

Order count and average order value by market segment in H2 1995

SELECT c_mktsegment, COUNT(*) AS order_count, AVG(o_totalprice) AS avg_order_value FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate >= '1995-07-01' AND o_orderdate < '1996-01-01' GROUP BY c_mktsegment

Shipping method with the highest late delivery rate

SELECT l_shipmode, COUNT(CASE WHEN l_receiptdate > l_commitdate THEN 1 END) * 1.0 / COUNT(*) AS late_shipping_rate FROM lineitem GROUP BY l_shipmode ORDER BY late_shipping_rate DESC LIMIT 1

Design guidelines

Write detailed descriptions

The description field is the primary source of business context for the LLM. Include the business meaning, allowed values (for enums), and calculation logic (for metrics).

    dimensions:
      - name: order_status
        # Good: lists all values and their meanings
        description: Order status. O = open, F = fulfilled, P = partially fulfilled.
        expr: o_orderstatus
        data_type: varchar

Model dimensions, metrics, and relationships correctly

Properly classify fields and define precise table joins:

  • Dimensions (dimensions/time_dimensions): fields typically used in GROUP BY, such as categorical attributes or time fields.

  • Facts → Metrics: define raw numeric fields as facts first, then build aggregate metrics on top of them.

  • Relationships: use many_to_one from fact tables to dimension tables. Include only the joins actually needed.

Use filters for common conditions

Define frequently used filter conditions as named filters. The LLM applies them directly when users phrase queries with business terminology.

    filters:
      - name: completed_only
        description: Include only fulfilled line items
        expr: l_linestatus = 'F'
      - name: high_value_orders
        description: Orders with a total amount greater than 10000
        expr: o_totalprice > 10000

Split by topic and iterate

Focus each Semantic View on a single analytical domain (such as "sales" or "supply chain"). Exclude internal technical fields like partition keys and audit columns.

Use CREATE OR REPLACE to iterate: add descriptions, define filters, or adjust metric expressions based on observed NL2SQL error patterns.

Limits

Item

Description

Feature status

Semantic View is in private preview. Features and syntax may change before general availability.

YAML syntax

YAML is indentation-sensitive. Ensure correct indentation in the definition.

Metric expressions

The expr field in metrics must be a valid SQL aggregate expression. Column names must match physical table columns.

Table references

The schema and table specified in base_table must exist in the cluster.