すべてのプロダクト
Search
ドキュメントセンター

PolarDB:Hybrid search

最終更新日:Dec 27, 2025

PolarDB for PostgreSQL は、密な検索、疎な検索、ハイブリッド検索など、複数の検索方法をサポートしています。

背景情報

  • 密な検索:セマンティックコンテキストを使用して、クエリの背後にある意味を理解します。

  • 疎な検索:テキストマッチングを重視し、特定の用語に基づいて結果を検索します。これは全文検索に相当します。

  • ハイブリッド検索:密な検索と疎な検索を組み合わせて、完全なコンテキストと特定のキーワードの両方をキャプチャします。これにより、包括的な検索結果が提供されます。

データの準備

  1. 特権アカウントを使用して、検索に必要な拡張機能を作成します。

    CREATE EXTENSION IF NOT EXISTS rum;
    CREATE EXTENSION IF NOT EXISTS vector;
    CREATE EXTENSION IF NOT EXISTS polar_ai;

    拡張機能は、次の機能を提供します:

  2. テーブルを作成し、テストデータを挿入します。

    CREATE TABLE t_chunk(id serial, chunk text, embedding vector(1536), v tsvector);
    
    INSERT INTO t_chunk(chunk) VALUES('Unlock the Power of AI 1 million free tokens 88% Price Reduction Activate Now AI Search Contact Sales English Cart Console Log In Why Us Pricing Products Solutions Marketplace Developers Partners Documentation Services Model Studio PolarDB Filter in menu Product Overview Benefits Billing Announcements and Updates Getting Started User Guide Use Cases Developer Reference Support Home Page PolarDBProduct OverviewSearch for Help ContentProduct OverviewUpdated at: 2025-01-06 08:50ProductCommunityWhat is PolarDB?PolarDB is a new-generation database service that is developed by Alibaba Cloud. This service decouples computing from storage and uses integrated software and hardware. PolarDB is a secure and reliable database service that provides auto scaling within seconds, high performance, and mass storage. PolarDB is 100% compatible with MySQL and PostgreSQL and highly compatible with Oracle.');
    INSERT INTO t_chunk(chunk) VALUES('PolarDB provides three engines: PolarDB for MySQL, PolarDB for PostgreSQL, and PolarDB-X. Years of best practices in Double 11 events prove that PolarDB can offer the flexibility of open source ecosystems and the high performance and security of commercial cloud-native databases.Database engine Ecosystem Compatibility Architecture Platform Scenario PolarDB for MySQL MySQL 100% compatible with MySQL Shared storage and compute-storage decoupled architecture Public cloud, Apsara Stack Enterprise Edition, DBStack');
    INSERT INTO t_chunk(chunk) VALUES('PolarDB for PostgreSQL PostgreSQL and Oracle 100% compatible with MySQL and highly compatible with Oracle Shared storage and compute-storage decoupled architecture Public cloud, Apsara Stack Enterprise Edition, DBStack Cloud-native databases in the PostgreSQL ecosystem PolarDB-X MySQL Standard Edition is 100% compatible with MySQL and Enterprise Edition is highly compatible with MySQL shared nothing and distributed architecture Public cloud, Apsara Stack Enterprise Edition, DBStack');
    INSERT INTO t_chunk(chunk) VALUES('Architecture of PolarDB for MySQL and PolarDB for PostgreSQL PolarDB for MySQL and PolarDB for PostgreSQL both use an architecture of shared storage and compute-storage decoupling. They are featured by cloud-native architecture, integrated software and hardware, and shared distributed storage. Physical replication and RDMA are used between, the primary node and read-only nodes to reduce latency and accelerate data synchronization. This resolves the issue of non-strong data consistency caused by asynchronous replication and ensures zero data loss in case of single point of failure (SPOF). The architecture also enables node scaling within seconds.');
    INSERT INTO t_chunk(chunk) VALUES('Core components PolarProxy PolarDB uses PolarProxy to provide external services for the applications. PolarProxy forwards the requests from the applications to database nodes. You can use the proxy to perform authentication, data protection, and session persistence. The proxy parses SQL statements, sends write requests to the primary node, and evenly distributes read requests to multiple read-only nodes.Compute nodes A cluster contains one primary node and multiple read-only nodes. A cluster of Multi-master Cluster Edition (only for PolarDB for MySQL) supports multiple primary nodes and multiple read-only nodes. Compute nodes can be either general-purpose or dedicated.Shared storage Multiple nodes in a cluster share storage resources. A single cluster supports up to 500 TB of storage capacity.');
    INSERT INTO t_chunk(chunk) VALUES('Architecture benefits Large storage capacity The maximum storage capacity of a cluster is 500 TB. You do not need to purchase clusters for database sharding due to the storage limit of a single host. This simplifies application development and reduces the O&M workload.Cost-effectiveness PolarDB decouples computing and storage. You are charged only for the computing resources when you add read-only nodes to a PolarDB cluster. In traditional database solutions, you are charged for both computing and storage resources when you add nodes.Elastic scaling within minutes PolarDB supports rapid scaling for computing resources. This is based on container virtualization, shared storage, and compute-storage decoupling. It requires only 5 minutes to add or remove a node. The storage capability is automatically scaled up. During the scale-up process, your services are not interrupted.');
    INSERT INTO t_chunk(chunk) VALUES('Read consistency PolarDB uses log sequence numbers (LSNs) for cluster endpoints that have read/write splitting enabled. This ensures global consistency for read operations and prevents the inconsistency that is caused by the replication delay between the primary node and read-only nodes.Millisecond-level latency in physical replication PolarDB performs physical replication from the primary node to read-only nodes based on redo logs. The physical replication replaces the logical replication that is based on binary logs. This way, the replication efficiency and stability are improved. No delays occur even if you perform DDL operations on large tables, such as adding indexes or fields.Data backup within seconds Snapshots that are implemented based on the distributed storage can back up a database with terabytes of data in a few minutes. During the entire backup process, no locks are required, which ensures high efficiency and minimized impacts on your business. Data can be backed up anytime.');
    INSERT INTO t_chunk(chunk) VALUES('Architecture of PolarDB-X PolarDB-X uses an architecture of shared nothing and compute-storage decoupling. This architecture lets you achieve hierarchical capacity planning as needed and implement mass scaling.Core components Global meta service (GMS): provides distributed metadata and a global timestamp distributor named Timestamp Oracle (TSO) and maintains meta information such as tables, schemas, and statistics. GMS also maintains security information such as accounts and permissions.Compute node (CN): provides a distributed SQL engine that contains core optimizers and executors. A CN uses a stateless SQL engine to provide distributed routing and computing and uses the two-phase commit protocol (2PC) to coordinate distributed transactions. A CN also executes DDL statements in a distributed manner and maintains global indexes.Data node (DN): provides a data storage engine. A data node uses Paxos to provide highly reliable storage services and uses multiversion concurrency control (MVCC) for distributed transactions. A data node also provides the pushdown computation feature to push down operators such as Project, Filter, Join, and Agg in distributed systems, and supports local SSDs and shared storage.Change data capture (CDC): provides a primary/secondary replication protocol that is compatible with MySQL. The primary/secondary replication protocol is compatible with the protocols and data formats that are supported by MySQL binary logging. CDC uses the primary/secondary replication protocol to exchange data.');
  3. ベクトルデータを生成します。カスタムモデルを作成して呼び出すことで、テキストをベクトルに変換できます。

    -- 埋め込みの実行
    UPDATE t_chunk SET embedding = <custom_model_call_function>('<custom_model_name>', chunk);
  4. 検索に必要なインデックスを作成します。

    • ベクトルインデックス。この例では L2 距離を使用します。必要に応じて調整できます。

      CREATE INDEX ON t_chunk using hnsw(embedding vector_l2_ops);
    • 全文インデックス。

      UPDATE t_chunk SET v = to_tsvector('english', chunk);
      
      CREATE INDEX ON t_chunk USING rum (v rum_tsvector_ops);

データの取得

密な検索

検索はベクトルのみに基づいています。距離が小さいほど、類似度が高いことを示します。

SELECT chunk, embedding <-> polar_ai.ai_text_embedding('What database engines does PolarDB provide')::vector(1536) as dist
FROM t_chunk
ORDER by dist ASC
limit 5;

疎な検索

検索は全文に限定されます。距離が小さいほど、類似度が高いことを示します。

SELECT chunk, v <=> to_tsquery('english', 'PolarDB|PostgreSQL|efficiency') as rank
FROM t_chunk 
WHERE v @@ to_tsquery('english', 'PolarDB|PostgreSQL|efficiency')
ORDER by rank ASC
LIMIT 5;

ハイブリッド検索

ハイブリッド検索は、両方の検索方法の結果をマージして、マルチチャネル検索を実現します。

WITH t AS (
SELECT chunk, embedding <-> polar_ai.ai_text_embedding('What database engines does PolarDB provide')::vector(1536) as dist
FROM t_chunk
ORDER by dist ASC
limit 5 ),
t2 as (
  SELECT chunk, v <=> to_tsquery('english', 'PolarDB|PostgreSQL|efficiency') as rank
FROM t_chunk 
WHERE v @@ to_tsquery('english', 'PolarDB|PostgreSQL|efficiency')
ORDER by rank ASC
LIMIT 5
)
SELECT * FROM t
UNION ALL
SELECT * FROM t2;

これら 2 つの距離計算方法は統一されていないため、逆順位融合 (RRF) モデルが統一されたランキングに使用されます。RRF とは、関連性メトリックが異なる複数の結果セットを、単一の結果セットに統合するための手法です。この手法を用いることで、チューニングを行うことなく高品質な結果を得られます。各関連性メトリックが相互に無関係であっても問題ありません。基本的なステップは次のとおりです:

  1. 検索フェーズでのランキングの収集

    複数の検索手法が、それぞれのクエリに対してソートされた結果を生成します。

  2. Fuse ランク

    逆数和などの単純なスコアリング関数を使用して、各検索手法からのランク位置を重み付けして融合します。数式は次のとおりです:

    この数式では、 は検索方法の数、 番目の検索手法におけるドキュメント の順位、 は平滑化パラメーターで、通常は 60 に設定されます。

  3. 総合的なソート

    融合されたスコアに基づいてドキュメントを再ランク付けし、最終的な結果を生成します。

上記のクエリで、結果のランキングに満足できない場合は、パラメーター を調整して順序を変更します。 パラメーターに基づいて、システムは全文検索 (疎ベクトル) とベクトル検索 (密ベクトル) の の結果から各ドキュメントをスコアリングします。各ドキュメントは数式 を使用してスコアリングされます。ここで、 はドキュメントのランク です。全文検索の の結果に含まれるドキュメントが、ベクトル検索の の結果に表示されない場合、そのドキュメントは 1 つのスコアしか持ちません。ベクトル検索の結果に含まれるドキュメントが全文検索の結果に表示されない場合も同様です。ドキュメントが両方の検索の の結果セットに表示される場合、それらのスコアは合計されます。

説明

平滑化パラメーター は、結果セット内のドキュメントが最終的なソート順にどのように影響するかを制御します。値が大きいほど、ランクの低いドキュメントに大きな重みが与えられます。

-- 密ベクトル検索
WITH t1 as 
(
SELECT chunk, embedding <-> polar_ai.ai_text_embedding('What database engines does PolarDB provide')::vector(1536) as dist
FROM t_chunk
ORDER by dist ASC
limit 5
),
t2 as (
SELECT ROW_NUMBER() OVER (ORDER BY dist ASC) AS row_num,
chunk
FROM t1
),
-- 疎ベクトル検索
t3 as 
(
  SELECT chunk, v <=> to_tsquery('english', 'PolarDB|PostgreSQL|efficiency') as rank
  FROM t_chunk 
  WHERE v @@ to_tsquery('english', 'PolarDB|PostgreSQL|efficiency')
  ORDER by rank ASC
  LIMIT 5
),
t4 as (  
SELECT ROW_NUMBER() OVER (ORDER BY rank DESC) AS row_num,
chunk
FROM t3
),
-- 各セットの RRF スコアを計算
t5 AS (
SELECT 1.0/(60+row_num) as score, chunk FROM t2
UNION ALL 
SELECT 1.0/(60+row_num), chunk FROM t4
)
-- スコアをマージ
SELECT sum(score) as score, chunk
FROM t5
GROUP BY chunk
ORDER BY score DESC;

重みによる重み付け

また、密な検索の重みを 0.8、疎な検索の重みを 0.2 のように、異なる結果セットに異なる重みを設定することもできます。

-- 密ベクトル検索
WITH t1 as 
(
SELECT chunk, embedding <-> polar_ai.ai_text_embedding('What database engines does PolarDB provide')::vector(1536) as dist
FROM t_chunk
ORDER by dist ASC
limit 5
),
t2 as (
SELECT ROW_NUMBER() OVER (ORDER BY dist ASC) AS row_num,
chunk
FROM t1
),
-- 疎ベクトル検索
t3 as 
(
  SELECT chunk, v <=> to_tsquery('english', 'PolarDB|PostgreSQL|efficiency') as rank
  FROM t_chunk 
  WHERE v @@ to_tsquery('english', 'PolarDB|PostgreSQL|efficiency')
  ORDER by rank ASC
  LIMIT 5
),
t4 as (  
SELECT ROW_NUMBER() OVER (ORDER BY rank DESC) AS row_num,
chunk
FROM t3
),
-- 各セットの RRF スコアを計算、重みは 0.8 と 0.2
t5 as (
SELECT (1.0/(60+row_num)) * 0.8 as score , chunk FROM t2
UNION ALL 
SELECT (1.0/(60+row_num)) * 0.2, chunk FROM t4
)
-- スコアをマージ
SELECT sum(score) as score, chunk
FROM t5
GROUP BY chunk
ORDER BY score DESC;