All Products
Search
Document Center

PolarDB:Graph analysis based on PolarDB: Quick data import from tables to graphs

Last Updated:Mar 30, 2026

PolarDB for PostgreSQL's AGE extension supports a bulk import method that bypasses the MATCH step in Cypher edge creation. This method delivers over 2x the performance of the standard MATCH...CREATE approach and scales to hundreds of millions of nodes and edges.

When to use this method

Use this method when:

  • You need to import a large number of edges (tens of millions or more) and standard Cypher import is too slow.

  • Your nodes and edges are already in relational tables or CSV files.

  • The graph is new or empty — this method is incompatible with graphs where nodes were previously created using Cypher's CREATE statement.

How it works

The standard Cypher MATCH...CREATE statement creates an edge by first finding both endpoints with MATCH, then inserting the edge. At large scale, this per-edge node lookup becomes a bottleneck.

This method eliminates the MATCH step by pre-computing unique numeric IDs for every node before import. Edges reference those IDs directly, so the insert requires no graph traversal.

The five-step process:

  1. Load node data into a staging table. One column or a combination of columns serves as the natural key.

  2. Load edge data into a staging table. Each row includes the natural key of the source node and the target node, plus edge properties.

  3. Compute a numeric unique ID for each node and add it to the node table. Populate the corresponding _fromid and _toid columns in the edge table using a JOIN or a mapping function.

  4. Bulk-insert nodes into the graph using the pre-computed IDs.

  5. Bulk-insert edges into the graph using the pre-computed IDs — no MATCH step required.

Prerequisites

Before you begin, ensure that you have:

  • A PolarDB for PostgreSQL cluster running one of the following engine versions:

    • PostgreSQL 16 (revision version 2.0.16.8.3.0 or later)

    • PostgreSQL 15 (revision version 2.0.15.12.4.0 or later)

    • PostgreSQL 14 (revision version 2.0.14.12.24.0 or later)

To check your revision version, run SHOW polardb_version; in your database, or view the revision version in the console. To upgrade, see version management.

Staging table requirements

All staging tables must satisfy the following constraints before import:

Column Type Constraint
Node unique ID (_id) bigint UNIQUE, >= 0, < 281474976710656
Edge source ID (_fromid) bigint >= 0, < 281474976710656
Edge target ID (_toid) bigint >= 0, < 281474976710656

Example walkthrough

This example builds a toys graph with two node types (v_user, v_product) and one edge type (e_own). The graph models users who own products.

image

Step 1: Set up the AGE extension

  1. Create the AGE extension.

    CREATE EXTENSION age;
  2. Configure the database search path and preload the extension library.

    ALTER DATABASE <dbname> SET search_path = "$user", public, ag_catalog;
    ALTER DATABASE <dbname> SET session_preload_libraries TO 'age';
    If you encounter compatibility issues when configuring search_path through Data Management (DMS), use PolarDB-Tools to run these statements instead.
  3. Create the graph, vertex labels, and edge label.

    -- Create the graph
    SELECT create_graph('toys');
    
    -- Create vertex labels (v = vertex)
    SELECT create_vlabel('toys', 'v_user');
    SELECT create_vlabel('toys', 'v_product');
    
    -- Create an edge label (e = edge)
    SELECT create_elabel('toys', 'e_own');

Step 2: Create staging tables

Create staging tables for nodes and edges, then insert sample data.

-- Node staging tables
CREATE TABLE public.v_user_raw(type text, uid text, name text, age integer);
INSERT INTO v_user_raw VALUES
  ('A', 'U1', 'Alice', '33'),
  ('B', 'U1', 'Bob',   '21');

CREATE TABLE public.v_product_raw(product_id text, price double precision);
INSERT INTO v_product_raw VALUES
  ('INAKLIDAS', '50'),
  ('ENKUCLKSD', '80'),
  ('IIUIHAKLS', '320'),
  ('SDVDSUHEE', '340');

-- Edge staging table
CREATE TABLE public.e_own_raw(user_type text, user_uid text, product_id text, buy_price text);
INSERT INTO e_own_raw VALUES
  ('A', 'U1', 'INAKLIDAS', '45'),
  ('B', 'U1', 'ENKUCLKSD', '70'),
  ('B', 'U1', 'INAKLIDAS', '50'),
  ('B', 'U1', 'SDVDSUHEE', '330');

Step 3: Generate unique IDs

Every node needs a bigint unique ID in the range [0, 281474976710656). Two strategies are available depending on whether the node's natural key can be converted to a number.

Strategy 1: Mapping function (for computable numeric keys)

Use this strategy when you can derive a numeric ID from the node's existing columns using a deterministic formula.

For v_user nodes, the combination of type (10 possible values, A–J) and uid (numeric suffix) produces a unique integer. Register this as a function and reuse it when populating the edge table.

-- Register the ID generation function
CREATE OR REPLACE FUNCTION v_user_idgen(type text, uid text) RETURNS bigint AS
$$
  SELECT (ASCII(type) - ASCII('A')) + substring(uid, 2)::bigint * 10
$$
LANGUAGE SQL;

-- Add the _id column to the node table
ALTER TABLE v_user_raw ADD COLUMN _id bigint UNIQUE CHECK(_id < 281474976710656 AND _id >= 0);
UPDATE v_user_raw SET _id = v_user_idgen(type, uid);

-- Populate _fromid in the edge table using the same function
ALTER TABLE e_own_raw ADD COLUMN _fromid bigint CHECK(_fromid < 281474976710656 AND _fromid >= 0);
UPDATE e_own_raw SET _fromid = v_user_idgen(user_type, user_uid);

Advantage: The function can be called any time to map node properties back to their IDs, making incremental imports and updates straightforward.

Strategy 2: Auto-sequence with JOIN (for non-numeric string keys)

Use this strategy when the natural key is a complex string that cannot be converted to a number.

For v_product nodes, product_id is an opaque string. Use PostgreSQL's bigserial type to auto-assign sequence numbers, then JOIN to propagate IDs to the edge table.

-- Add a bigserial _id column (auto-assigns sequence numbers)
ALTER TABLE v_product_raw ADD COLUMN _id bigserial UNIQUE CHECK(_id < 281474976710656 AND _id >= 0);

-- Create an index on the natural key to speed up the JOIN
CREATE INDEX ON v_product_raw(product_id);

-- Populate _toid in the edge table using a JOIN
ALTER TABLE e_own_raw ADD COLUMN _toid bigint CHECK(_toid < 281474976710656 AND _toid >= 0);
UPDATE e_own_raw SET _toid = v_product_raw._id
  FROM v_product_raw
  WHERE v_product_raw.product_id = e_own_raw.product_id;
To create a composite index when the natural key spans multiple columns, use CREATE INDEX ON v_product_raw(column1, column2, ...);.

Trade-off: No formula maps properties back to IDs — you must rely on the JOIN for incremental updates.

Step 4: Create helper functions

These functions compute the internal ID range for each graph label and return the associated sequence name. Create them once per database.

CREATE OR REPLACE FUNCTION age_name_to_idx_start(graph_name text, kind_name text, label_name text)
RETURNS bigint
AS 'SELECT id::bigint<<48 FROM ag_catalog.ag_label
    WHERE kind = kind_name
      AND name = label_name
      AND graph = (SELECT graphid FROM ag_catalog.ag_graph WHERE name = graph_name)'
LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION age_name_to_seq(graph_name text, kind_name text, label_name text)
RETURNS text
AS 'SELECT graph_name || ''.'' || seq_name::text FROM ag_catalog.ag_label
    WHERE kind = kind_name
      AND name = label_name
      AND graph = (SELECT graphid FROM ag_catalog.ag_graph WHERE name = graph_name)'
LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;

Step 5: Import nodes and edges

Import nodes

  1. (Optional) Create unique indexes to prevent duplicate IDs. These indexes also accelerate subsequent graph queries.

    CREATE UNIQUE INDEX ON toys.v_user(id);
    CREATE UNIQUE INDEX ON toys.v_product(id);
  2. Insert nodes. Select only the property columns from the staging table — exclude _id and any other helper columns added during preparation.

    -- Import v_user nodes
    INSERT INTO toys."v_user"
    SELECT
      (age_name_to_idx_start('toys', 'v', 'v_user') + _id)::text::graphid,
      row_to_json((SELECT x FROM (SELECT type, uid, name, age) x))::text::agtype
    FROM v_user_raw;
    
    SELECT setval(age_name_to_seq('toys', 'v', 'v_user'), (SELECT max(_id) + 1 FROM v_user_raw));
    
    -- Import v_product nodes
    INSERT INTO toys."v_product"
    SELECT
      (age_name_to_idx_start('toys', 'v', 'v_product') + _id)::text::graphid,
      row_to_json((SELECT x FROM (SELECT product_id, price) x))::text::agtype
    FROM v_product_raw;
    
    SELECT setval(age_name_to_seq('toys', 'v', 'v_product'), (SELECT max(_id) + 1 FROM v_product_raw));

Import edges

Edges are built from the _fromid and _toid columns. Include only the edge's own properties — exclude node properties from both sides.

INSERT INTO toys."e_own"
SELECT
  (age_name_to_idx_start('toys', 'e', 'e_own') + nextval(age_name_to_seq('toys', 'e', 'e_own')))::text::graphid,
  (age_name_to_idx_start('toys', 'v', 'v_user')    + _fromid)::text::graphid,
  (age_name_to_idx_start('toys', 'v', 'v_product') + _toid)::text::graphid,
  row_to_json((SELECT x FROM (SELECT buy_price) x))::text::agtype
FROM e_own_raw;

Verify the import

Query nodes and edges to confirm the import succeeded.

Query v_user nodes:

SELECT * FROM cypher('toys', $$
  MATCH (v:v_user)
  RETURN v
$$) AS (v agtype);

Expected output:

v
--------------------------------------------------------------------------------------------------------------------------
 {"id": 844424930131978, "label": "v_user", "properties": {"age": 33, "uid": "U1", "name": "Alice", "type": "A"}}::vertex
 {"id": 844424930131979, "label": "v_user", "properties": {"age": 21, "uid": "U1", "name": "Bob", "type": "B"}}::vertex
(2 rows)

Query v_product nodes:

SELECT * FROM cypher('toys', $$
  MATCH (v:v_product)
  RETURN v
$$) AS (v agtype);

Expected output:

v
-------------------------------------------------------------------------------------------------------------------
 {"id": 1125899906842625, "label": "v_product", "properties": {"price": "50", "product_id": "INAKLIDAS"}}::vertex
 {"id": 1125899906842626, "label": "v_product", "properties": {"price": "80", "product_id": "ENKUCLKSD"}}::vertex
 {"id": 1125899906842627, "label": "v_product", "properties": {"price": "320", "product_id": "IIUIHAKLS"}}::vertex
 {"id": 1125899906842628, "label": "v_product", "properties": {"price": "340", "product_id": "SDVDSUHEE"}}::vertex
(4 rows)

Query edges — Bob's purchases with discount:

SELECT buy, sell, sell::int - buy::int AS discount
FROM cypher('toys', $$
  MATCH (v1:v_user)-[e:e_own]-(v2:v_product)
  WHERE v1.name = 'Bob'
  RETURN e.buy_price, v2.price
$$) AS (buy agtype, sell agtype);

Expected output:

 buy  | sell  | discount
------+-------+----------
 "70"  | "80"  |       10
 "50"  | "50"  |        0
 "330" | "340" |       10
(3 rows)

Incremental imports and updates

After the initial import, use the same method to add new nodes and edges incrementally.

Important

Do not use Cypher's CREATE statement to add nodes after completing the initial import. CREATE generates IDs independently and disrupts the ID mapping produced by the v_user_idgen function or the bigserial sequence. Once disrupted, this import method can no longer be applied reliably.

Add a new node

-- Insert into the staging table with the pre-computed ID
INSERT INTO v_user_raw VALUES
  ('A', 'U2', 'Alien', '99', v_user_idgen('A', 'U2'));

-- Import the new node into the graph
INSERT INTO toys.v_user
SELECT
  (age_name_to_idx_start('toys', 'v', 'v_user') + _id)::text::graphid,
  row_to_json((SELECT x FROM (SELECT type, uid, name, age) x))::text::agtype
FROM (SELECT (('A', 'U2', 'Alien', '97', v_user_idgen('A', 'U2'))::v_user_raw).*) raw;

SELECT setval(age_name_to_seq('toys', 'v', 'v_user'), (SELECT max(_id) + 1 FROM v_user_raw));

-- Add a new product node using bigserial (ID is auto-assigned)
INSERT INTO v_product_raw(product_id, price) VALUES
  ('AIEEEEEEE', '999');

INSERT INTO toys."v_product"
SELECT
  (age_name_to_idx_start('toys', 'v', 'v_product') + _id)::text::graphid,
  row_to_json((SELECT x FROM (SELECT product_id, price) x))::text::agtype
FROM v_product_raw
WHERE product_id = 'AIEEEEEEE';

SELECT setval(age_name_to_seq('toys', 'v', 'v_product'), (SELECT max(_id) + 1 FROM v_product_raw));

Add a new edge

Clear the edge staging table, insert the new edges, recompute _fromid and _toid, then bulk-insert.

DELETE FROM e_own_raw;

INSERT INTO e_own_raw VALUES
  ('A', 'U2', 'AIEEEEEEE', '9999');

UPDATE e_own_raw SET _fromid = v_user_idgen(user_type, user_uid);
UPDATE e_own_raw SET _toid = v_product_raw._id
  FROM v_product_raw
  WHERE v_product_raw.product_id = e_own_raw.product_id;

INSERT INTO toys."e_own"
SELECT
  (age_name_to_idx_start('toys', 'e', 'e_own') + nextval(age_name_to_seq('toys', 'e', 'e_own')))::text::graphid,
  (age_name_to_idx_start('toys', 'v', 'v_user')    + _fromid)::text::graphid,
  (age_name_to_idx_start('toys', 'v', 'v_product') + _toid)::text::graphid,
  row_to_json((SELECT x FROM (SELECT buy_price) x))::text::agtype
FROM e_own_raw;

Limitations

  • This method requires the graph to be empty of nodes created via Cypher's CREATE statement. Mixing both approaches corrupts the ID mapping.

  • The staging tables (v_user_raw, v_product_raw, e_own_raw) and the mapping functions (v_user_idgen) must be retained if you plan to use this method for future imports. They can be dropped if you no longer plan to use this method for future imports, without affecting graph data.

  • All node IDs must fit within the range [0, 281474976710656). IDs outside this range violate the constraint and cause the import to fail.

What's next

Visit the PolarDB free trial page and select PolarDB to explore the graph computing capabilities of GanosBase provided by PolarDB for PostgreSQL.