Hologres V4.0 enhances vector search with the HGraph vector search algorithm, delivering high performance, high accuracy, and low latency. For more information, see HGraph Index User Guide (recommended). This topic describes how to use the Hologres Python Search SDK (holo-search-sdk) for full-text search and vector search.
Prerequisites
-
Create an AccessKey. For more information, see Create an AccessKey.
-
You have Python 3.8 or later installed.
-
Grant the required permissions to your account. For more information, see Grant permissions to a RAM user.
Install the SDK
Install the Python SDK using pip. For more information, see holo-search-sdk.
This topic uses holo-search-sdk version 0.3.0. If you encounter errors, verify your SDK version.
-
Install for the first time
pip install holo-search-sdk -
Check the installed version
# Check current version pip show holo-search-sdk # Upgrade if version is earlier than 0.3.0 pip install --upgrade holo-search-sdk
Usage steps
Connect to Hologres
import holo_search_sdk as holo
# Connect to the database
client = holo.connect(
host="<HOLO_HOST>",
port=<HOLO_PORT>,
database="<HOLO_DBNAME>",
access_key_id="<ACCESS_KEY_ID>",
access_key_secret="<ACCESS_KEY_SECRET>",
schema="public" # Update as needed
)
# Establish the connection
client.connect()
Variable descriptions:
|
Variable |
Description |
|
HOLO_HOST |
The network endpoint of your Hologres instance. Go to the Hologres Management Console, then choose Hologres to get the endpoint. |
|
HOLO_PORT |
The port number of your Hologres instance. Go to the Hologres Management Console, then choose Hologres to get the port number. |
|
HOLO_DBNAME |
The name of your Hologres database. |
|
ACCESS_KEY_ID |
The AccessKey ID of your Alibaba Cloud account. Go to AccessKey Management to get your AccessKey ID. |
|
ACCESS_KEY_SECRET |
The AccessKey secret of your Alibaba Cloud account. |
Create a table
Create the table using DDL. The following is an example:
create_table_sql = """
CREATE TABLE IF NOT EXISTS <TABLE_NAME> (
id BIGINT PRIMARY KEY,
content TEXT,
vector_column FLOAT4[] CHECK (array_ndims(vector_column) = 1 AND array_length(vector_column, 1) = 3),
publish_date TIMESTAMP
);
"""
_ = client.execute(create_table_sql, fetch_result=False)
Replace <TABLE_NAME> with your actual table name.
Open a table
You must create the vector table in your Hologres instance before you open it.
columns = {
"id": ("INTEGER", "PRIMARY KEY"),
"content": "TEXT",
"vector_column": "FLOAT4[]",
"publish_date": "TIMESTAMP"
}
table = client.open_table("<TABLE_NAME>")
Import text or vector data
data = [
[1, "Hello world", [0.1, 0.2, 0.3], "2023-01-01"],
[2, "Python SDK", [0.4, 0.5, 0.6], "2024-01-01"],
[3, "Vector search", [0.7, 0.8, 0.9], "2025-01-01"]
]
table.insert_multi(data, ["id", "content", "vector_column", "publish_date"])
Update text or vector data
# Upsert one record
table.upsert_one(
index_column="id",
values=[1, "Updated content", [0.3, 0.2, 0.1], "2026-01-01"],
column_names=["id", "content", "vector_column", "publish_date"],
update=True # Update on conflict
)
# Upsert multiple records
table.upsert_multi(
index_column="id",
values=[
[1, "Updated content 1", [0.2, 0.3, 0.4], "2026-02-01"],
[2, "Updated content 2", [0.6, 0.5, 0.7], "2024-02-01"]
],
column_names=["id", "content", "vector_column", "publish_date"],
update=True,
update_columns=["content", "vector_column"] # Specify columns to update
)
Set indexes
-
Set a vector index
table.set_vector_index( column="vector_column", distance_method="Cosine", base_quantization_type="rabitq", use_reorder=True, max_degree=64, ef_construction=400 ) -
Set a full-text index
# Create a full-text index table.create_text_index( index_name="ft_idx_content", column="content", tokenizer="jieba" ) # Modify a full-text index table.set_text_index( index_name="ft_idx_content", tokenizer="ik" ) # Delete a full-text index table.drop_text_index(index_name="ft_idx_content")
Query data
Vector search
# Vector search
query_vector = [0.2, 0.3, 0.4]
# Limit results
results = table.search_vector(
vector=query_vector,
column="vector_column",
distance_method="Cosine"
).limit(10).fetchall()
# Set minimum distance
results = table.search_vector(
vector=query_vector,
column="vector_column",
distance_method="Cosine"
).min_distance(0.5).fetchall()
# Search with output alias
results = table.search_vector(
vector=query_vector,
column="vector_column",
output_name="similarity_score",
distance_method="Cosine"
).fetchall()
Full-text search
# Basic full-text search
results = table.search_text(
column="content",
expression="machine learning",
return_all_columns=True
).fetchall()
# Full-text search with BM25 relevance score
results = table.search_text(
column="content",
expression="deep learning",
return_score=True,
return_score_name="relevance_score"
).select(["id", "vector_column", "content"]).fetchall()
# Use different search modes
# Keyword mode (default)
results = table.search_text(
column="content",
expression="python programming",
mode="match",
operator="AND" # All keywords must be present
).fetchall()
# Phrase mode
results = table.search_text(
column="content",
expression="machine learning",
mode="phrase" # Exact phrase match
).fetchall()
# Natural language mode
results = table.search_text(
column="content",
expression="+python -java", # Must contain python, must not contain java
mode="natural_language"
).fetchall()
# Term search
results = table.search_text(
column="content",
expression="python",
mode="term" # No tokenization or processing. Exact match only.
).fetchall()
Hybrid search
# Full-text + scalar search
results = (
table.search_text(
column="content",
expression="artificial intelligence",
return_score=True,
return_score_name="score"
)
.where("publish_date > '2023-01-01'")
.order_by("score", "desc")
.limit(10)
.fetchall()
)
# Vector + scalar search
results = (
table.search_vector(
vector=query_vector,
column="vector_column",
output_name="similarity_score",
distance_method="Cosine"
)
.where("publish_date > '2023-01-01'")
.order_by("similarity_score", "desc")
.limit(10)
.fetchall()
)
Primary key point query
# Query one record by primary key
result = table.get_by_key(
key_column="id",
key_value=1,
return_columns=["id", "content", "vector_column"] # Optional. Returns all columns if omitted.
).fetchone()
# Query multiple records by primary key list
results = table.get_multi_by_keys(
key_column="id",
key_values=[1, 2, 3],
return_columns=["id", "content"] # Optional. Returns all columns if omitted.
).fetchall()
Close the connection
# Close the connection
client.disconnect()
FAQ
-
Q: You receive this error when importing holo_search_sdk:
import holo_search_sdk as holo racevack (most recent call last): File "<stdin›" , line 1, in ‹module› File "/usr/local/lib/python3.8/site-packages/holo_search_sdk/__init__.py", line 9, in ‹module› from .client import Client, connect File "/usr/local/lib/python3.8/site-packages/holo_search_sdk/client.py", line 9, in <module> from psycopg. abc import Query File "/usr/local/lib/python3.8/site-packages/psycopg/__init__.py", line 9, in <module> from. import pa # noqa: F401 import early to stabilize side effects File "/usr/local/lib/python3.8/site-packages/psycopg/pq/__init__.py", line 116, in ‹module› import_from_libpqO) File"/usr/local/lib/python3.8/site-packages/psycopg/pq/__init__.py",line 108, in import_from_libpa raise ImportError( ImportError: no pa wrapper available. Attempts made: - couldn't import psycopg 'c' implementation: No module named 'psycopg_c' - couldn't import psycopg 'binary' implementation: No module named 'psycopg_binary' - couldn't import psycopg 'python' implementation: libpa library not foundA: Install psycopg-binary in your Python environment. Run the following command:
pip install psycopg-binary