AnalyticDB for PostgreSQL provides API operations for vector-based image search, covering image upload, upload progress tracking, and search by text or image. This guide walks through each operation with Python code examples.
How it works
Vector-based image search represents images as multi-dimensional vectors and finds matches based on similarity, not keywords.
Extract visual features (color, shape, texture) from images and convert them into multi-dimensional vectors.
Store the vectors in AnalyticDB for PostgreSQL and build an index for fast retrieval.
When a query arrives (text or image), convert it into a feature vector and find the closest vectors using Euclidean distance or cosine similarity.
Return results ranked by similarity score.
AnalyticDB for PostgreSQL integrates vectorization algorithms and vector search capabilities, so you can focus on building the application rather than the underlying infrastructure.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL instance with:
Python 3.7 or later with the following packages installed:
alibabacloud-gpdb20160503requires Python 3.5.1 or later.pip install alibabacloud-gpdb20160503 pip install alibabacloud-tea-OpenAPI pip install alibabacloud-tea-util pip install alibabacloud-OpenAPI-utilYour AccessKey ID and AccessKey secret stored as environment variables:
export ALIBABA_CLOUD_ACCESS_KEY_ID="<YOUR_ALIBABA_CLOUD_ACCESS_KEY_ID>" export ALIBABA_CLOUD_ACCESS_KEY_SECRET="<YOUR_ALIBABA_CLOUD_ACCESS_KEY_SECRET>"For instructions on creating an AccessKey pair for a Resource Access Management (RAM) user, see Create an AccessKey pair.
Before uploading images
Complete the following setup before uploading any images:
Prepare your image data (cleansed and preprocessed) and create a vector index. See Create a vector index.
Create a namespace for your data, or use an existing one. See Create a namespace.
Create a collection in the namespace, or use an existing one. See CreateDocumentCollection.
When callingCreateDocumentCollection, set theEmbeddingModelparameter to specify the vectorization algorithm for the collection.
Upload images
All upload operations use the UploadDocumentAsync API operation, which is asynchronous. After the call returns, use the job ID to track progress (see Check upload progress).
Supported image formats: .bmp, .jpg, .jpeg, .png, .tiff.
Initialize the client
All examples in this section use the same client initialization. Create the client once and reuse it across operations:
# -*- coding: utf-8 -*-
import os
from alibabacloud_gpdb20160503.client import Client as gpdb20160503Client
from alibabacloud_tea_OpenAPI import models as open_api_models
from alibabacloud_gpdb20160503 import models as gpdb_20160503_models
from alibabacloud_tea_util import models as util_models
def create_client() -> gpdb20160503Client:
# Credentials are read from environment variables—never hardcode them.
config = open_api_models.Config(
access_key_id=os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
access_key_secret=os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"]
)
config.endpoint = "gpdb.aliyuncs.com"
return gpdb20160503Client(config)Upload a single image
Use UploadDocumentAsyncAdvanceRequest for local files (pass a file object) and UploadDocumentAsyncRequest for remote images (pass a URL string). The remaining parameters are identical.
Upload a local image:
client = create_client()
with open("<image_file_path>", "rb") as f:
# image_file_path: absolute path to the local image file
request = gpdb_20160503_models.UploadDocumentAsyncAdvanceRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
file_name="<filename-with-extension>", # e.g., photo.jpg
file_url_object=f,
dry_run=False,
metadata={"caption": "sample image", "category": "nature"}, # dict format
)
runtime = util_models.RuntimeOptions()
try:
response = client.upload_document_async_advance(request, runtime)
print("Job ID:", response.body.job_id)
except Exception as error:
print(error)Upload a remote image:
client = create_client()
request = gpdb_20160503_models.UploadDocumentAsyncRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
file_name="<filename-with-extension>", # e.g., photo.jpg
file_url="<image_file_url>", # publicly accessible URL
dry_run=False,
metadata={"caption": "sample image", "category": "nature"}, # dict format
)
runtime = util_models.RuntimeOptions()
try:
response = client.upload_document_async_with_options(request, runtime)
print("Job ID:", response.body.job_id)
except Exception as error:
print(error)Parameter reference:
| Parameter | Description |
|---|---|
region_id | The region ID of the AnalyticDB for PostgreSQL instance |
dbinstance_id | The ID of the AnalyticDB for PostgreSQL instance |
namespace | The name of the namespace |
namespace_password | The password of the namespace |
collection | The name of the collection |
file_name | The image file name, including the extension (.bmp, .jpg, .jpeg, .png, or .tiff) |
file_url_object | (Local) The file object opened in binary read mode |
file_url | (Remote) The URL of the remote image |
metadata | Metadata for the image, in dict format. Fields you add here (e.g., caption, category) are returned in search results. |
Upload multiple images
To upload multiple images at once, pack them into a compressed archive and upload the archive. The UploadDocumentAsync operation extracts and processes each image in the archive.
client = create_client()
with open("<compress_file_path>", "rb") as f:
# compress_file_path: absolute path to the local archive (.tar, .gz, or .zip)
request = gpdb_20160503_models.UploadDocumentAsyncAdvanceRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
file_name="<archive-filename-with-extension>", # e.g., images.zip
file_url_object=f,
dry_run=False,
metadata={"batch": "upload-batch-1"},
)
runtime = util_models.RuntimeOptions()
try:
response = client.upload_document_async_advance(request, runtime)
print("Job ID:", response.body.job_id)
except Exception as error:
print(error)Each compressed archive can contain up to 100 images. Supported compression formats: TAR, GZ, and ZIP.
Check upload progress
UploadDocumentAsync is asynchronous. Poll GetUploadDocumentJob with the job ID until the status is Success.
client = create_client()
request = gpdb_20160503_models.GetUploadDocumentJobRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
job_id="<job_id>", # job_id returned by UploadDocumentAsync
)
runtime = util_models.RuntimeOptions()
try:
response = client.get_upload_document_job_with_options(request, runtime)
print("Status:", response.body.job.status)
except Exception as error:
print(error)When job.status returns Success, all images in the upload job are indexed and ready for search. For more information, see GetUploadDocumentJob.
Search for images
Search by text
QueryContent accepts a text string, converts it into a feature vector, and returns the top-k most similar images.
# -*- coding: utf-8 -*-
import os
from urllib.request import urlopen
from PIL import Image
from alibabacloud_gpdb20160503.client import Client as gpdb20160503Client
from alibabacloud_tea_OpenAPI import models as open_api_models
from alibabacloud_gpdb20160503 import models as gpdb_20160503_models
from alibabacloud_tea_util import models as util_models
client = create_client() # create_client() defined in the "Initialize the client" section
request = gpdb_20160503_models.QueryContentRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
content="Dog", # the text query
top_k=3,
)
runtime = util_models.RuntimeOptions()
try:
response = client.query_content_with_options(request, runtime)
if response.status_code != 200:
raise Exception(f"QueryContent failed: {response.body}")
for match in response.body.matches.match_list:
url = match.file_url
caption = match.metadata.get("caption")
print(f"URL: {url}, Caption: {caption}")
Image.open(urlopen(url)).show()
except Exception as error:
print(error)Each item in match_list has a file_url pointing to the matched image and a metadata dict containing fields you set during upload (such as caption or category).
Parameter reference:
| Parameter | Description |
|---|---|
region_id | The region ID of the AnalyticDB for PostgreSQL instance |
dbinstance_id | The ID of the AnalyticDB for PostgreSQL instance |
namespace | The name of the namespace |
namespace_password | The password of the namespace |
collection | The name of the collection |
content | The text query string |
top_k | The number of results to return |
Sample output (querying "Dog"):



Results vary based on the images in your collection.
Search by image
To search using a local image as the query, use QueryContentAdvanceRequest with a file object and filename.
# -*- coding: utf-8 -*-
import os
from urllib.request import urlopen
from PIL import Image
from alibabacloud_gpdb20160503.client import Client as gpdb20160503Client
from alibabacloud_tea_OpenAPI import models as open_api_models
from alibabacloud_gpdb20160503 import models as gpdb_20160503_models
from alibabacloud_tea_util import models as util_models
client = create_client() # create_client() defined in the "Initialize the client" section
query_file_path = "<image_file_path>" # absolute path to the query image
with open(query_file_path, "rb") as f:
request = gpdb_20160503_models.QueryContentAdvanceRequest(
region_id="<your-instance-region-id>",
dbinstance_id="<your-instance-id>",
namespace="<your-namespace-name>",
namespace_password="<your-namespace-password>",
collection="<your-collection-name>",
file_url_object=f,
file_name=os.path.basename(query_file_path),
top_k=3,
)
runtime = util_models.RuntimeOptions()
try:
response = client.query_content_advance(request, runtime)
if response.status_code != 200:
raise Exception(f"QueryContent failed: {response.body}")
for match in response.body.matches.match_list:
url = match.file_url
caption = match.metadata.get("caption")
print(f"URL: {url}, Caption: {caption}")
Image.open(urlopen(url)).show()
except Exception as error:
print(error)Parameter reference:
| Parameter | Description |
|---|---|
region_id | The region ID of the AnalyticDB for PostgreSQL instance |
dbinstance_id | The ID of the AnalyticDB for PostgreSQL instance |
namespace | The name of the namespace |
namespace_password | The password of the namespace |
collection | The name of the collection |
file_url_object | The query image file object, opened in binary read mode |
file_name | The filename of the query image, including the extension |
top_k | The number of results to return |
Sample output (querying with a bicycle image):



Results vary based on the images in your collection.
Build a web UI with Streamlit
Streamlit is a Python framework for machine learning and data visualization, written in Python, that converts data scripts into web applications. Use it to add a search interface on top of your AnalyticDB for PostgreSQL image search backend.
Quick start: Streamlit tutorial
Install Streamlit:
pip install streamlitThe following example builds a text-to-image search UI. Run the script with streamlit run <script_name>.py.
# -*- coding: utf-8 -*-
import os
import streamlit as st
from alibabacloud_gpdb20160503.client import Client as gpdb20160503Client
from alibabacloud_tea_OpenAPI import models as open_api_models
from alibabacloud_gpdb20160503 import models as gpdb_20160503_models
from alibabacloud_tea_util import models as util_models
def create_client() -> gpdb20160503Client:
config = open_api_models.Config(
access_key_id=os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
access_key_secret=os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"]
)
config.endpoint = "gpdb.aliyuncs.com"
return gpdb20160503Client(config)
def search_by_text(content: str) -> list:
# client is an instance of gpdb20160503Client initialized with credentials from environment variables
client = create_client()
request = gpdb_20160503_models.QueryContentRequest(
region_id="{your-instance-region-id}",
dbinstance_id="{your-instance-id}",
namespace="{your-namespace-name}",
namespace_password="{your-namespace-password}",
collection="{your-collection-name}",
content=content,
top_k=3,
)
runtime = util_models.RuntimeOptions()
try:
response = client.query_content_with_options(request, runtime)
if response.status_code != 200:
raise Exception(f"QueryContent failed: {response.body}")
return [(m.file_url, m.metadata.get("caption")) for m in response.body.matches.match_list]
except Exception as error:
print(error)
return []
# Streamlit UI
st.header('Demo for searching for images by text')
text_query = st.chat_input("Enter a keyword")
st.text(f"Keyword: {text_query}" if text_query else "Keyword: ")
if text_query:
for url, caption in search_by_text(text_query):
st.image(url)
st.text(f"Description: {caption}")Parameter reference:
| Parameter | Description |
|---|---|
region_id | The region ID of the AnalyticDB for PostgreSQL instance |
dbinstance_id | The ID of the AnalyticDB for PostgreSQL instance |
namespace | The name of the namespace |
namespace_password | The password of the namespace |
collection | The name of the collection |
Sample output:
