×
Community Blog PostgreSQL Application in Image Search and Video and Image Deduplication

PostgreSQL Application in Image Search and Video and Image Deduplication

This post describes how PostgreSQL Image Search Plug-in helps to accelerate image-based searches and also highlights how PostgreSQL helps to screen out duplicate videos.

Background

Multiple image processing business scenarios, such as image search, video deduplication, image beautification, and image deduplication, encounter the common challenge of managing the humongous volume of duplicate images and videos stored on the server.

For instance, users may upload many videos where specific video has different versions with different resolutions, audio tracks, or compression ratios. This leads to a large number of duplicate videos saved on the server. Another example is the identification of porn videos or images. It is crucial to identify such content but the challenge is how to identify porn videos and images.

On the other hand, image search is the second most common search engine after text search. Image search engines, such as Alibaba Cloud Image Search, allow you to upload a particular picture to the interface and search out a bunch of similar pictures.

This article describes how to screen out duplicate videos and how to implement an efficient image search using PostgreSQL. With the omnipotent API of PostgreSQL, it is easy to extend its image search function.

PostgreSQL Image Search Plug-in: Background Technology

The PostgreSQL image search plug-in uses the mainstream Haar wavelet technology to store images after conversion. Take a look at its Wikipedia page to learn more about this technology.

You can also refer to this document to know more about how to search similar images in PostgreSQL.

Following are some examples to illustrate the Haar wavelet technology application.

1

2

3

4

Introduction to the PostgreSQL Image Search Plug-in

Follow the steps below to install the PostgreSQL Image Search Plug-in:

Step 1. Add dependency on gd.h.

yum install -y gd-devel

Step 2. Download and install imgsmlr.

$ git clone https://github.com/postgrespro/imgsmlr
$ cd imgsmlr
$ export PGHOME=/home/digoal/pgsql9.5
$ export PATH=$PGHOME/bin:$PATH:.

$ make USE_PGXS=1
$ make USE_PGXS=1 install

Step 3. Install the plug-in.

$ psql
psql (9.5.3)
Type "help" for help.
postgres=# create extension imgsmlr;
CREATE EXTENSION

The following are two newly added data types in imgsmlr.

Datatype Storage length Description
pattern 16388 bytes Result of Haar wavelet transform on the image
signature 64 bytes Short representation of pattern for fast search using GiST indexes

The KNN operator and gist index method that supports the pattern and signature types can be used for similar image search.

Operator Left type Right type Return type Description
<-> pattern pattern float8 Eucledian distance between two patterns
<-> signature signature float8 Eucledian distance between two signatures

The following table shows several newly added functions. Now, you can convert binary images to the pattern type and convert the data stored in the pattern into the signature type.

Function Return type Description
jpeg2pattern(bytea) pattern Convert jpeg image into pattern
png2pattern(bytea) pattern Convert png image into pattern
gif2pattern(bytea) pattern Convert gif image into pattern
pattern2signature(pattern) signature Create signature from pattern
shuffle_pattern(pattern) pattern Shuffle pattern for less sensitivity to image shift

PostgreSQL Image Search Plug-in: Testing

Follow the steps listed below to test the PostgreSQL Image Search Plug-in:

Step 1. Import some images as shown below (the more the better).

5

Step 2. Now, create the image table.

create table image (id serial, data bytea);

Step 3. Import the images to the database.

insert into image(data) select pg_read_binary_file('文件路径');

Step 4. Next, convert the images to the pattern and signature types.

CREATE TABLE pat AS (
SELECT
id,
    shuffle_pattern(pattern) AS pattern, 
    pattern2signature(pattern) AS signature 
FROM (
    SELECT 
    id, 
    jpeg2pattern(data) AS pattern 
    FROM 
image
    ) x 
);

Step 5. Create indexes as shown below.

ALTER TABLE pat ADD PRIMARY KEY (id);

CREATE INDEX pat_signature_idx ON pat USING gist (signature);

Step 6. Now you can run an approximation query, such as querying images that are similar to id = : id images and retrieve the top 10 items on the similarity ranking list.

SELECT
id,
smlr
FROM
(
SELECT
id,
    pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr
FROM pat
WHERE id <> :id
ORDER BY
    signature <-> (SELECT signature FROM pat WHERE id = :id)
LIMIT 100
) x
ORDER BY x.smlr ASC 
LIMIT 10

Here, you can use KNN indexing to quickly get output based on the similarity rankings.

Let's take a look at the following examples.

6

7

8

9

10

11

12

13

14

Video Deduplication

For video deduplication, extract keyframes in a video to generate the Cartesian product through self-correlation. Calculate the similarity between any two images of different videos, and when the similarity reaches a certain threshold value, the two videos are deemed the same.

Refer to the following example for better understanding.

创建图片表,并将所有视频的关键帧导入表中
create table image (id serial8 primary key, movie_id int, data bytea);

导入图片,假设为jpeg格式
... 略 ...

生成patten 和 signature
CREATE TABLE pat AS (
SELECT
id, movie_id,
    shuffle_pattern(pattern) AS pattern, 
    pattern2signature(pattern) AS signature 
FROM (
    SELECT 
id, movie_id,
    jpeg2pattern(data) AS pattern 
    FROM 
image
    ) x 
);

计算不同视频的相似度
select t1.movie_id, t1.id, t1.signature<->t2.signature from 
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) 
order by t1.signature<->t2.signature desc 

or 

select t1.movie_id, t1.id, t1.signature<->t2.signature from 
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) 
where t1.signature<->t2.signature > 0.9 
order by t1.signature<->t2.signature desc 

Conclusion

In short, PostgreSQL is a very powerful database with highly customizable functions and does not require you to make any changes to the PostgreSQL kernel. Therefore, it is safe and reliable. The image search technology is an example of PostgreSQL feature extension and offers extremely high speed.

Build your own PostgreSQL solution on Alibaba Cloud with ApsaraDB for RDS PostgreSQL.

0 0 0
Share on

digoal

281 posts | 24 followers

You may also like

Comments

digoal

281 posts | 24 followers

Related Products