Image processing technology, such as image search, has a multitude of applications in the real world. For example, Internet users may upload multiple versions of a video or image, each with different formatting, audio tracks, or compression ratios. This leads to a significant number of duplicate videos stored on the service end. However, this problem can be solved using data de-duplication. But how is this normally done?
When you use search engines to look for relevant images, the search engine will process the image and the tags related to the image. For example, when I search for a "snowman" image, a search engine may return me this result.
Pretty accurate right? Typically, PostgreSQL is behind the implementation of the image search and its Payment Gateway Application Programming Interface (API) extends the image search function.
PostgreSQL’s image search plug-in adopts the mainstream Haar wavelet technology to convert and store an image. The following figures briefly describe the Haar wavelet technology. For additional details, refer to the following Wikipedia link: https://en.wikipedia.org/wiki/Haar_wavelet
Below are the steps to install PostgreSQL image search plug-in:
# yum install -y gd-devel
$ git clone https://github.com/postgrespro/imgsmlr $ cd imgsmlr $ export PGHOME=/home/digoal/pgsql9.5 $ export PATH=$PGHOME/bin:$PATH:. $ make USE_PGXS=1 $ make USE_PGXS=1 install
$ psql psql (9.5.3) Type "help" for help. postgres=# create extension imgsmlr; CREATE EXTENSION
|Data Type||Storage Length||Description|
|Pattern||16388 bytes||Result of Haar wavelet transform on the image|
|Signature||64 bytes||Short representation of pattern for fast search using GiST indexes|
|Data Type||Left Type||Right Type||Return Type||Description|
|<->||pattern||pattern||float8||Eucledian distance between two patterns|
|<->||signature||signature||float8||Eucledian distance between two signatures|
This adds several functions.
|jpeg2pattern(bytea)||pattern||Convert jpeg image to pattern|
|png2pattern(bytea)||pattern||Convert png image to pattern|
|gif2pattern(bytea)||pattern||Convert gif image to pattern|
|pattern2signature(pattern)||signature||Create signature from pattern|
|shuffle_pattern(pattern)||pattern||Shuffle pattern for less sensitivity to image shift|
Once you are done installing, carry out these steps to perform PostgreSQL image search plug-in test:
CREATE TABLE pat AS ( SELECT id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, jpeg2pattern(data) AS pattern FROM image ) x );
ALTER TABLE pat ADD PRIMARY KEY (id); CREATE INDEX pat_signature_idx ON pat USING gist (signature);
SELECT id, smlr FROM ( SELECT id, pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr FROM pat WHERE id <> :id ORDER BY signature <-> (SELECT signature FROM pat WHERE id = :id) LIMIT 100 ) x ORDER BY x.smlr ASC LIMIT 10
For the most part, our search engine works as expected.
However, sometimes the image search does not work too well.
This is because the computer "sees" the images differently from humans. It processes an object as a 2D matrix, and transform it to a signature, which is readable for computers.
For video de-duplication, you can extract key frames in a video to generate the Cartesian product through self-correlation. Remember to calculate the similarity of two images of different videos. When the similarity reaches a certain threshold, the services deem the two videos the same.
CREATE TABLE pat AS ( SELECT id, movie_id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, movie_id, jpeg2pattern(data) AS pattern FROM image ) x );
select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) order by t1.signature<->t2.signature desc or select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) where t1.signature<->t2.signature > 0.9 order by t1.signature<->t2.signature desc
Image de-duplication requires Postgres as their database and uses its API. PostgreSQL is a powerful database with customizable functions. It not only ensures image de-duplication effectively but is also safe and reliable. Video de-duplication is the additional feature that is possible using PostgreSQL. Haar wavelet algorithm adds to the possibility of searching images on popular search engines. The implementation of PostgreSQL and installation are aspects that are worth knowing.
Alibaba Clouder - July 31, 2019
ApsaraDB - January 22, 2021
Alibaba Clouder - January 17, 2018
Alibaba Clouder - July 6, 2018
Alibaba Clouder - February 14, 2019
Alibaba Clouder - August 4, 2020
An online MPP warehousing service based on the Greenplum Database open source programLearn More
An intelligent image search service with product search and generic search features to help users resolve image search requests.Learn More
A fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured dataLearn More
More Posts by Alibaba Clouder