AnalyticDB for PostgreSQL provides a vector engine optimization feature for vector analysis scenarios, such as AIGC and vector retrieval. This tutorial guides you through the entire process, from creating an AnalyticDB for PostgreSQL instance to performing vector analysis.
Preparations
An Alibaba Cloud account is required. If you do not have an account, you can register one on the Alibaba Cloud official website.
Authorize a service-linked role. The first time you use AnalyticDB for PostgreSQL, you must authorize the creation of a service-linked role in the console. To do this, perform the following steps:
- Log on to the AnalyticDB for PostgreSQL console.
In the upper-right corner of the page, click Create Instance.
In the Create Service Linked Role dialog box that appears, click OK.
Billing
When you create an instance, you are charged for its compute and storage resources. For more information, see Pricing.
Free trial
Alibaba Cloud offers a free trial for Storage-elastic Mode instances. If you are a new user of AnalyticDB for PostgreSQL, you can visit Alibaba Cloud Free Trial to apply for a free trial. If you are not eligible for a free trial, follow the steps in this topic to create an instance in the console.
Procedure
Create an instance
- Log on to the AnalyticDB for PostgreSQL console.
In the upper-right corner of the page, click Create Instance to open the buy page.
On the instance purchase page, configure the core parameters to quickly select an instance type. You can keep the default values for other parameters. For more information about the parameters, see Create an instance.
Configuration Item
Description
Example for This Tutorial
Product Type
Subscription: This is a subscription billing method. You pay upfront when you create the instance. This method is suitable for long-term use and is more cost-effective than pay-as-you-go. The longer the subscription duration, the higher the discount.
Pay-as-you-go: This is a post-paid billing method. You are billed hourly. This method is suitable for short-term use. You can release the instance immediately after use to save costs.
Pay-as-you-go
Region and Zone
The geographic location of the instance.
You cannot change the region and zone after the instance is created. Create the instance in the same region as the ECS instances that you want to connect to. This enables service interconnection over the internal network.
China (Hangzhou): Zone J
Instance Resource Type
Elastic Storage Mode: Supports independent disk scale-out and smooth online scale-out.
Serverless Pro: Specify only the required computing resources. You do not need to reserve storage resources.
Elastic Storage Mode
Database Engine Version
Select 7.0 Standard Edition for a richer feature experience. 6.0 Standard Edition is also supported.
7.0 Standard Edition
Instance Edition
High-performance (Basic Edition): Suitable for most business analysis scenarios.
High-availability Edition: Recommended for core business services.
High-performance (Basic Edition)
Vector Engine Optimization
Select Enable.
Enable
Virtual Private Cloud
Select the ID of the VPC.
To interconnect with an ECS instance in the same region over the internal network, select the same VPC as the ECS instance. You can select an existing VPC or create a VPC and vSwitch as prompted on the page.
vpc-xxxx
vSwitch
Select a vSwitch in the VPC. If no vSwitches are available, no vSwitch resources are available in the zone. You can switch to another zone or create a vSwitch in the current zone as prompted on the page.
vsw-xxxx
Click Buy Now, confirm the order information, and click Activate Now.
After you complete the payment, click Management Console to go to the instance list and view the new instance.
NoteThe initialization of an AnalyticDB for PostgreSQL instance takes some time. Wait for the instance status to change to Running before you proceed with the next steps.
Create an initial account
AnalyticDB for PostgreSQL provides two types of users:
Privileged user: The initial account is a privileged user with the RDS_SUPERUSER role. This role grants all operational permissions on the database.
Regular user: By default, a regular user has no permissions. A privileged user, or another user with the GRANT permission, must explicitly grant permissions on database objects to the regular user. For more information about how to create a regular user, see Create and manage users.
In the navigation pane on the left, click Account Management.
Click Create Account. In the Create Account window, enter an account name, set a password, and then click OK.
Parameter
Description
Account
The name of the initial account.
Contain lowercase letters, digits, and underscores (_).
Must start with a lowercase letter and end with a lowercase letter or a digit.
Cannot start with gp.
Must be 2 to 16 characters in length.
New Password and Confirm Password
The password of the initial account.
Must contain at least three of the following character types: uppercase letters, lowercase letters, digits, and special characters.
Special characters include
! @ # $ % ^ & * ( ) _ + - =.Must be 8 to 32 characters in length.
ImportantFor security, change your password regularly and avoid reusing old ones.
Configure a whitelist
If you use only Data Management (DMS) to access the instance, you can skip this step. If you access the database from a local integrated development environment (IDE) or an ECS instance, you must configure an IP address whitelist. To obtain the client IP address, see Preparations.
In the navigation pane on the left of the instance details page, click Security Controls.
Click Create Whitelist and configure the following parameters:
Configuration
Description
Whitelist Name
The name of the new whitelist group. The name must meet the following requirements:
Consist of lowercase letters, digits, and underscores (_).
Start with a lowercase letter and end with a lowercase letter or a digit.
Be 2 to 32 characters in length.
IP Addresses
The IP addresses to add to the whitelist. Note the following:
Separate multiple IP addresses with commas (,). You can add up to 999 unique IP addresses.
The supported formats are specific IP addresses, such as 10.23.12.24, and CIDR blocks, such as 10.23.12.24/24. CIDR stands for Classless Inter-Domain Routing. The number after the slash (/) indicates the prefix length, which can be from 1 to 32.
If you set the prefix length to 0, such as 0.0.0.0/0 or 127.0.0.1/0, all IP addresses can access the instance. This poses a high security risk. Use this setting with caution.
The IP address 127.0.0.1 blocks access from all external IP addresses.
Click OK.
Connect to the database
This section uses the psql tool as an example to demonstrate how to log on to the database. For information about how to connect using other tools, see Client connection.
Install psql.
Download the psql tool.
wget https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gzThe preceding download command applies only if the client operating system is RHEL 7 or CentOS 7. To use other versions of the psql tool, see Client connection.
Decompress the psql tool.
tar -xzvf adbpg_client_package.el7.x86_64.tar.gzGo to the directory where the psql tool is located.
cd adbpg_client_package/bin
Log on to the database.
./psql -h <Endpoint of the AnalyticDB for PostgreSQL instance> -p 5432 -d <Database to connect to> -U <Account of the AnalyticDB for PostgreSQL instance>To obtain the endpoint:
If the client is deployed on an ECS instance that is in the same region and has the same network type as the AnalyticDB for PostgreSQL instance, you can use the internal endpoint to connect. Log on to the AnalyticDB for PostgreSQL console, and click the ID of the target instance to open the instance details page. In the navigation pane on the left, click Basic Information. In the Database Connection Information section, you can view the Internal Endpoint.
If the client and the AnalyticDB for PostgreSQL instance are deployed in different regions, on ECS instances with different network types, or on systems outside of Alibaba Cloud, you need to request a public endpoint and use it to connect. For more information, see Manage public endpoints.
The following is an example of a connection command:
psql -h gp-bp13zq652yy4p****-master.gpdb.rds.aliyuncs.com -p 5432 -d postgres -U testuserPress Enter and enter the password. If the system displays
postgres=>, the connection is successful.
Import vector data using SQL
Confirm that the FastANN vector retrieval extension is installed.
You can run the
\dx fastanncommand to check whether the extension is installed. If information about the FastANN extension is returned, the extension is installed. If no information is returned, submit a ticket to request the installation.You can download the test data.
For testing purposes, AnalyticDB for PostgreSQL provides sample data. To download the data, click vector_sample_data.csv.
The schema of the sample data table is as follows.
Field
Type
Description
id
bigint
ID.
market_time
timestamp
The time when the car was launched.
color
varchar(10)
The color of the car.
price
int
The price of the car.
feature
float4[]
The feature vector of the car photo.
The command to download the sample data on a Linux server is as follows.
wget https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20230606/uzkx/vector_sample_data.csvImport the data.
Create and switch to a database.
-- Create a database named testdb. CREATE DATABASE testdb; -- Switch to the database. \c testdbCreate a table that contains a vector field based on the sample data.
CREATE SCHEMA IF NOT EXISTS vector_test; CREATE TABLE IF NOT EXISTS vector_test.car_info ( id bigint NOT NULL, market_time timestamp, color varchar(10), price int, feature float4[], PRIMARY KEY(id) ) USING heap -- If vector search engine optimization is enabled when you create the instance, the default engine is heap. DISTRIBUTED BY(id);Create indexes.
In this tutorial, a structured index is created for fields such as car launch time, color, and price, and a vector index is created for the feature vector of the car photo.
-- Change the storage format of the vector field to PLAIN. ALTER TABLE vector_test.car_info ALTER COLUMN feature SET STORAGE PLAIN; -- Create structured indexes. CREATE INDEX ON vector_test.car_info(market_time); CREATE INDEX ON vector_test.car_info(color); CREATE INDEX ON vector_test.car_info(price); -- Create a vector index. CREATE INDEX ON vector_test.car_info USING ann(feature) WITH (dim='10', pq_enable='0');Load the sample data into the data table.
\COPY vector_test.car_info FROM '/DATA_PATH/vector_sample_data.csv';Replace
/DATA_PATH/vector_sample_data.csvin the example with the path where the sample data file is stored. For example, if the sample data is downloaded to the/homedirectory, the path should be/home/vector_sample_data.csv.After the data is imported, the system returns
COPY 10000.
Vector analysis
This tutorial demonstrates vector analysis by providing examples of how to obtain the squared Euclidean distance, dot product distance, and cosine similarity.
Get the Euclidean distance
Perform vector analysis and obtain the squared Euclidean distance.
SELECT id, l2_squared_distance(feature, array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]) AS distance
FROM vector_test.car_info
ORDER BY feature <-> array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]
LIMIT 10;The following is a sample output.
id | distance
------+--------------------
2 | 0
1331 | 0.0677967891097069
1543 | 0.079616591334343
5606 | 0.0892329216003418
6423 | 0.0894578248262405
1667 | 0.0903968289494514
8215 | 0.0936210229992867
7801 | 0.0952572822570801
2581 | 0.0965127795934677
2645 | 0.0987173467874527
(10 rows)Get the dot product distance (cosine similarity)
Perform vector analysis and obtain the dot product distance. When the data is normalized, the dot product distance is equal to the cosine similarity.
SELECT id, dp_distance(feature, array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]) AS similarity
FROM vector_test.car_info
ORDER BY feature <-> array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]
LIMIT 10;The following is a sample output.
id | similarity
------+-------------------
2 | 1
1331 | 0.966101586818695
1543 | 0.960191607475281
5606 | 0.955383539199829
6423 | 0.955271065235138
1667 | 0.954801559448242
8215 | 0.953189492225647
7801 | 0.95237135887146
2581 | 0.951743602752686
2645 | 0.950641334056854
(10 rows)Hybrid search query
To perform a hybrid search of structured and unstructured data, you can use the following SQL statement.
SELECT id, dp_distance(feature, array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]) AS similarity
FROM vector_test.car_info
WHERE market_time >= '2020-10-30 00:00:00'
AND market_time < '2021-01-01 00:00:00'
AND color in ('red', 'white', 'blue')
AND price < 100
ORDER BY feature <-> array[0.495181661387,0.108697291209,0.181728549067,0.109680543346,0.19713082404,0.0197809514512,0.534227452778,0.442411970815,0.409909873031,0.0975687394505]::float4[]
LIMIT 10;The following is a sample output.
id | similarity
------+-------------------
7645 | 0.922723233699799
8956 | 0.920517802238464
8219 | 0.91210675239563
8503 | 0.895939946174622
5113 | 0.895431876182556
7680 | 0.893448948860168
8433 | 0.893425941467285
3604 | 0.89293098449707
3945 | 0.891274154186249
7153 | 0.891128540039062
(10 rows)References
For more information about vector analysis, see Vector analysis.