Hologres V4.0 improves vector search by supporting the HGraph algorithm. This algorithm provides high-performance, high-precision, and low-latency vector search. For more information, see the HGraph index guide (Recommended). This topic describes how to use the Hologres Python Search SDK (holo-search-sdk) to perform full-text and vector searches.
Prerequisites
An AccessKey has been created. For more information, see Create an AccessKey.
Python 3.8 or later has been installed.
The required permissions have been granted to the account. For more information, see Grant permissions to a RAM user.
Install the SDK
You can install the Python SDK using pip. For more information, see holo-search-sdk.
The examples in this topic use holo-search-sdk version 0.3.0. If an error occurs, check your SDK version.
First-time installation
pip install holo-search-sdkCheck the installed version
# Check the current version pip show holo-search-sdk # If the version is earlier than 0.3.0, upgrade the SDK pip install --upgrade holo-search-sdk
Usage
Connect to Hologres
import holo_search_sdk as holo
# Connect to the database
client = holo.connect(
host="<HOLO_HOST>",
port=<HOLO_PORT>,
database="<HOLO_DBNAME>",
access_key_id="<ACCESS_KEY_ID>",
access_key_secret="<ACCESS_KEY_SECRET>",
schema="public" # Modify the schema as needed
)
# Establish the connection
client.connect()
Parameter descriptions:
Variable | Description |
HOLO_HOST | The network address of the Hologres instance. To obtain the network address, go to the Hologres console. Choose Hologres > . |
HOLO_PORT | The port of the Hologres instance. Go to the Hologres console. Choose Hologres > to obtain the instance port. |
HOLO_DBNAME | The name of the Hologres database. |
ACCESS_KEY_ID | The AccessKey ID of your Alibaba Cloud account. Go to AccessKey Management to obtain the AccessKey ID. |
ACCESS_KEY_SECRET | The AccessKey secret of your Alibaba Cloud account. |
Create a table
Create a table using a Data Definition Language (DDL) statement. The following code provides an example:
create_table_sql = """
CREATE TABLE IF NOT EXISTS <TABLE_NAME> (
id BIGINT PRIMARY KEY,
content TEXT,
vector_column FLOAT4[] CHECK (array_ndims(vector_column) = 1 AND array_length(vector_column, 1) = 3),
publish_date TIMESTAMP
);
"""
_ = client.execute(create_table_sql, fetch_result=False)Replace <TABLE_NAME> with the actual table name.
Open a table
columns = {
"id": ("INTEGER", "PRIMARY KEY"),
"content": "TEXT",
"vector_column": "FLOAT4[]",
"publish_date": "TIMESTAMP"
}
table = client.open_table("<TABLE_NAME>")Import text and vector data
data = [
[1, "Hello world", [0.1, 0.2, 0.3], "2023-01-01"],
[2, "Python SDK", [0.4, 0.5, 0.6], "2024-01-01"],
[3, "Vector search", [0.7, 0.8, 0.9], "2025-01-01"]
]
table.insert_multi(data, ["id", "content", "vector_column", "publish_date"])Set indexes
Set a vector index
table.set_vector_index( column="vector_column", distance_method="Cosine", base_quantization_type="rabitq", use_reorder=True, max_degree=64, ef_construction=400 )Set a full-text index
# Create a full-text index table.create_text_index( index_name="ft_idx_content", column="content", tokenizer="jieba" ) # Modify the full-text index table.set_text_index( index_name="ft_idx_content", tokenizer="ik" ) # Drop the full-text index table.drop_text_index(index_name="ft_idx_content")
Query data
Vector search query
# Vector search
query_vector = [0.2, 0.3, 0.4]
# Limit the number of results
results = table.search_vector(
vector=query_vector,
column="vector_column",
distance_method="Cosine"
).limit(10).fetchall()
# Set a minimum distance
results = table.search_vector(
vector=query_vector,
column="vector_column",
distance_method="Cosine"
).min_distance(0.5).fetchall()
# Search with an output alias
results = table.search_vector(
vector=query_vector,
column="vector_column",
output_name="similarity_score",
distance_method="Cosine"
).fetchall()Full-text search query
# Basic full-text search
results = table.search_text(
column="content",
expression="machine learning",
return_all_columns=True
).fetchall()
# Full-text search that returns BM25 similarity scores
results = table.search_text(
column="content",
expression="deep learning",
return_score=True,
return_score_name="relevance_score"
).select(["id", "vector_column", "content"]).fetchall()
# Use different search modes
# Keyword mode (default)
results = table.search_text(
column="content",
expression="python programming",
mode="match",
operator="AND" # Requires all keywords to be included
).fetchall()
# Phrase mode
results = table.search_text(
column="content",
expression="machine learning",
mode="phrase" # Exact phrase match
).fetchall()
# Natural language mode
results = table.search_text(
column="content",
expression="+python -java", # Must include python and exclude java
mode="natural_language"
).fetchall()
# Term search
results = table.search_text(
column="content",
expression="python",
mode="term" # Performs an exact match in the index without tokenization or other processing of the expression
).fetchall()Hybrid search
# Full-text search + scalar search
results = (
table.search_text(
column="content",
expression="artificial intelligence",
return_score=True,
return_score_name="score"
)
.where("publish_date > '2023-01-01'")
.order_by("score", "desc")
.limit(10)
.fetchall()
)
# Vector search + scalar search
results = (
table.search_vector(
vector=query_vector,
column="vector_column",
output_name="similarity_score",
distance_method="Cosine"
)
.where("publish_date > '2023-01-01'")
.order_by("similarity_score", "desc")
.limit(10)
.fetchall()
)Primary key point query
# Query a single record by primary key
result = table.get_by_key(
key_column="id",
key_value=1,
return_columns=["id", "content", "vector_column"] # Optional. If not specified, all columns are returned.
).fetchone()
# Batch query records by a list of primary keys
results = table.get_multi_by_keys(
key_column="id",
key_values=[1, 2, 3],
return_columns=["id", "content"] # Optional. If not specified, all columns are returned.
).fetchall()Disconnect
# Disconnect
client.disconnect()FAQ
Q: I receive the following error when I import holo_search_sdk:
import holo_search_sdk as holo racevack (most recent call last): File "<stdin›" , line 1, in ‹module› File "/usr/local/lib/python3.8/site-packages/holo_search_sdk/__init__.py", line 9, in ‹module> from .client import Client, connect File "/usr/local/lib/python3.8/site-packages/holo_search_sdk/client.py", line 9, in <module> from psycopg. abc import Query File "/usr/local/lib/python3.8/site-packages/psycopg/__init__.py", line 9, in <module> from. import pa # noqa: F401 import early to stabilize side effects File "/usr/local/lib/python3.8/site-packages/psycopg/pq/__init__.py", line 116, in ‹module› import_from_libpqO) File"/usr/local/lib/python3.8/site-packages/psycopg/pq/__init__.py",line 108, in import_from_libpa raise ImportError( ImportError: no pa wrapper available. Attempts made: - couldn't import psycopg 'c' implementation: No module named 'psycopg_c' - couldn't import psycopg 'binary' implementation: No module named 'psycopg_binary' - couldn't import psycopg 'python' implementation: libpa library not foundA: The psycopg-binary package must be installed in your Python environment. Run the following command:
pip install psycopg-binary