×
Community Blog How to Split a Single Table with 800 Million Rows? Let AI Design Your PolarDB-X Partition Solution in One Sentence

How to Split a Single Table with 800 Million Rows? Let AI Design Your PolarDB-X Partition Solution in One Sentence

This article introduces PolarDB-X Skill, an AI tool that automatically designs optimal table partitioning solutions using natural language.

1

By Wumu

A Real Problem

As a backend developer or DBA, you have an order table with 800 million rows. Queries are getting slower and slower, and your boss is pushing you to optimize.

You know the answer is "go distributed and partition the table," but when you actually start, a flood of questions hits you:

Which partition key should you choose? The primary key? A business field? Pick the wrong one, and you end up with either write hot spots or full-shard scan queries — slower than not splitting at all.

GSI How to build? When should you use a common GSI, Clustered GSI, or UGSI? Too many wastes resources; too few leaves queries uncovered.

How many partitions? 64? 128? 256? Guessing always feels unreliable.

How to handle primary keys and unique keys? PolarDB-X Partitioned table rules differ from standalone MySQL. Improper handling can lead to data loss.

The standard flow for solving these problems used to be: read documents → review Best Practices → compare against your own scenario → adjust repeatedly → find someone to review. Now, a single sentence is all it takes.

In the AI Era, Database Design Deserves a New Approach

LLMs can write code and translate, but for database partition design — a task requiring deep domain knowledge and strong context dependency — general-purpose LLMs often fail to deliver reliable solutions. They don't understand the details of PolarDB-X partition algorithms, the trade-off logic between Clustered GSI and common GSI, or how your demand mode should match a partition policy.

This is exactly the problem that PolarDB-X Skill aims to solve.

We encapsulated the distributed partitioning best practices that PolarDB-X has accumulated over the years into a Skill that can be directly invoked in AI programming assistants: polardbx-sql. It is not a general Q&A pair, but a partitioning design expert who understands PolarDB-X kernel rules.

Hands-on: An 800-million-row Order Table, from Description to Solution in One Step

You say:

We have an order table t_order, which is currently a single table with 800 million rows of data. Recently, queries have been getting slower and slower.

The table schema is roughly:

order_id BIGINT primary key,

buyer_id BIGINT, seller_id BIGINT, amount DECIMAL, create_time DATETIME, status TINYINT.

The most frequent query is querying the order list by buyer_id, followed by single-row queries by order_id, and some queries by seller_id. I want to convert this table to a PolarDB-X partitioned table in AUTO mode. Please help me design the partition solution and provide the complete SQL.

The Skill directly outputs a complete partition solution and executable DDL:

2

This Is Not Template Stitching, but Complete Decision Inference

Many people's first reaction is to ask: What's the difference between this and ChatGPT stitching templates?

The difference is that every output from Skill goes through an inference chain based on PolarDB-X Best Practices:

Decision Item Skill's Inference Procedure
Partition key → order_id Primary key, highest cardinality, no hot spots, high frequency of equivalent queries
Partition algorithm → KEY Primarily single-column equivalent queries. KEY/HASH can overwrite approximately 90% of scenarios.
Number of partitions → 256 Several times the number of DN edge zones. A single partition is controlled below 100 million, suitable for most payloads.
buyer_id → Clustered GSI The most frequently queried fields. In one-vs.-all scenarios, multiple columns need to obtain data from tables. Clustered GSI avoids the overhead of obtaining data from tables.
seller_id → General GSI Low query frequency with few returned rows. General GSI with minimal overhead from obtaining data from tables keeps cost controllable.
Primary Key Processing The primary key of a partitioned table must contain the partition key column to ensure global uniqueness. Skill handles this automatically.

Key difference: Skill determines the type of each GSI based on the demand mode you describe (which field is queried most often, and whether the query returns a list or a single row). This is not a mechanical operation of "creating an index whenever there is a query," but a precise match based on query features.

What If Your Scenario Is Different?

Skill can adapt to various partition design scenarios, such as:

Multi-dimension query -- "The order table is queried by both order_id and buyer_id, and their last few digits are the same" → Skill recommends CO_HASH partitions

Metric -- "Clean up expired data by month" → Skill recommends HASH + RANGE subpartitions + TTL

Multi-tenancy -- "Each tenant's data must be fenced" → Skill recommends a LIST + HASH combination

Tables with UNIQUE constraints -- "I have a UNIQUE KEY that cannot be lost" → Skill uses a three-step migration method: first create a UGSI, then expand the partitions

Just describe "what the table looks like and how to query it," and leave the rest to Skill.

Three Steps to Start

Step 1: Install Skill

npx skills add https://github.com/polardb/polardbx-skills --skill polardbx-sql

Step 2: Describe Your Tables and Queries in Natural Language

Just describe the table schema, data volume, and main demand mode. No strict format required.

Step 3: Get the Solution, Review It, and Execute Directly

Skill outputs executable DDL statements, which are directly executed on PolarDB-X after review confirmation.

Applicable to PolarDB-X 2.0 Enterprise Edition (Distributed Edition) databases in AUTO pattern.

Summary

In the AI era, the threshold for database O&M and design is being redefined. Partition design work that used to require reading documents, consulting experts, and repeated authentication can now be completed in seconds with a single Skill.

However, this does not mean that the value of DBs is decreasing — on the contrary, when AI takes over repetitive solution design work, DBs can focus more on architecture decisions, performance tuning, and stability assurance — things that truly require experienced judgment.

We will continue to release more Skills covering O&M, diagnostics, migration, and other scenarios. Feel free to try them out, and you are welcome to submit Issues and PRs on the GitHub repository to help make distributed databases easier to use.

0 0 0
Share on

ApsaraDB

628 posts | 185 followers

You may also like

Comments