How to compress an image by 10 times?

background

When a user takes a photo of a piece of clothing on a mobile phone, there is an obvious need to find a similar picture. The purpose of the image search project is to meet this part of the user's needs.

The project is initially expected to have 5 categories, with 5 million images in each category for retrieval. After feature extraction, each picture can be expressed as a point in a 30976-dimensional space, that is, it can be represented by 30976 float values. In order to facilitate processing, we multiply the feature value by 100000. Without losing the precision of the float value, use int32_t stores image features.

The Euclidean distance between the query image and the index image needs to be calculated during image retrieval. It takes 50 microseconds to calculate the Euclidean distance between images. After CPU instruction set optimization (SSE), the Euclidean distance calculation time is reduced to 13 microseconds.

Before compression, the size of all pictures is 3T (4 30k 25000000), and each machine has 30G memory for storing pictures, and 100 machines are required to store the full amount of pictures.

It is necessary to compress the image without reducing the efficiency of calculating the Euclidean distance, so as to reduce the occupation of the machine on a large scale.
Our goal is to compress to about 500G, that is, each picture is about 20k after compression, and the Euclidean distance calculation takes about 15 microseconds.

Euclidean distance calculation requires time-consuming at the microsecond level. Existing compression methods, such as p4delta and valgrind compression, do not meet the performance requirements. We need to customize the compression method according to the characteristics of the data.

achievement
The current compression method compresses a single picture from 120k to an average of 13k.
Euclidean distance calculation takes an average of 9 microseconds.
How did such a reliable result be achieved?

initial attempt

The method of bitmap
Observing the characteristics of the data, it is found that there are 7000 non-zero values in each picture on average, and all other values are 0. The first thing that comes to mind is to use bitmap to indicate whether the 30976 values are 0 at a specific position. The bitmap requires 30976 / 8 = 4k bytes. Then only non-zero values are stored, which requires 7k * 4, and the average size of each picture after compression is 32k. The compression code is roughly as follows:

However, when calculating the Euclidean distance between pictures, it is necessary to traverse the bitmap 30976 times and determine whether the value of a specific bit is 0, that is, to perform an AND operation on the bitmap and the mask array, which is time-consuming, and the calculation takes more than 100 microseconds . Compute the Euclidean distance codes of two compressed images:

Using offset compression method
We only save off_set and value of non-zero data. The maximum value of off_set is 30975, which needs to be saved with int16_t. The range of value is 0 to 1 million, which needs to be represented by int32_t. If this method is used, the space occupied by each picture is 42k (7k * (2 + 4)).

When calculating the Euclidean distance of two compressed images, use the method of merging according to off_set:


This method takes 55 microseconds to perform the Euclidean distance. We think that the if condition is time-consuming, so we tried to replace the if with a bit mask:

This method eliminates the if condition judgment, but the time-consuming is 70 microseconds, and the performance drops instead. The reason for the time-consuming is that the instructions of the CPU increase.

performance optimization
scene analysis

Two compressed image calculations --> one compressed and one uncompressed
The current optimization has entered a bottleneck, how can we improve the performance to the 10 microsecond level? Let's go back and reconsider the application scenario. The online scenario is to calculate the distance between the query image and a series of images, and the offline scenario is to calculate the distance between the center point image and a series of other images, that is, one image and multiple images. Distance calculation. At this time, a picture does not need to be compressed, and it can be completely uncompressed. Even if the picture is compressed, it can be decompressed first to calculate the Euclidean distance. In fact, it can be converted into an uncompressed picture and multiple compressed pictures to calculate the Euclidean distance. For such a situation, we need to reconsider the issue of improving efficiency.

Make sure to use the off_set compression method
For the problem of calculating the Euclidean distance between a compressed and an uncompressed image, compare the bitmap compression method with the off_set compression method. The off_set compression method has obvious advantages. For the bitmap method, the most time-consuming place is still accessing the bitmap 30976 times, and then Do AND operation to obtain the decompressed value. It takes time to traverse the bitmap 30976 times, at least 50 microseconds, but the off_set method saves off_set with 7000 non-zero data. We only need to traverse the non-zero array 7000 times to calculate Euclidean distance.

one compressed and one uncompressed

practice
First calculate the cumulative sum of squares of the uncompressed image, calculate the Euclidean distance multiple times for each query, and only calculate the cumulative sum of squares once.

Traverse the array of compressed images, and calculate the square of the difference between this value and the corresponding off_set of another uncompressed image.

For those values of 0 for compressed images, the Euclidean distance only needs to add the sum of squares of those values for uncompressed images.

Example:
a. Uncompressed pictures: [0 2 3 0 4 0 0 5 6 0 0], compressed pictures: [0 0 0 6 6 6 0 0 ]
b. Calculate the sum of squares of all values before the specific bit of the uncompressed picture in advance: sqrt[0 4 13 13 29 29 29 54 90 90 90]
c. The compressed array is off[3 4 5], data[6 6 6 ]
d. Traverse the off and data arrays

olength += (6 - 0) * (6 - 0) olength += (sqrt[2] - sqrt[0])
olength += (6 - 4) * (6 - 4)olength += (sqrt[3] - sqrt[3])
olength += (6 - 0) * (6 - 0) olength += (sqrt[4] - sqrt[4])
Efficiency: 20 microseconds
This method only needs to traverse the array 7000 times, perform 7000 subtraction square operations and 7000 access sqrt array operations, which greatly simplifies the complexity.

code show as below:
data1 is compressed data, data2 is uncompressed data:

square difference expansion

Is there a more optimized way?
The formula for calculating Euclidean distance is (a[1]-b[1])(a[1]-b[1]) + (a[2]-b[2])(a[2]-b[2] ) + ... +(a[n]-b[n])*(a[n]-b[n).

After expansion, a[1]a[1] + a[2]a[2] + ... +a[n]a[n]+ b[1]b[1] + b[2] b[ 2] + ... b[n]b[n] - 2(a[1]b[1] + a[2]b[2] + ... + a[n]b[n]).

And a[1]a[1] + ... a[n]a[n] and b[1]b[1] + ... + b[n]b[n] can be calculated first when compressing And save it, because the compression can take more time. So to calculate the Euclidean distance, you only need to calculate a[1]b[1] + a[2]b[2] + ... + a[n]*b[n].

For example, the above example:
a. Uncompressed pictures: [0 2 3 0 4 0 0 5 6 0 0], compressed pictures: [0 0 0 6 6 6 0 0 ]
b. Calculate the sum of squares of the compressed and uncompressed images. sqrt1:90 sqrt2:108
c. Calculate the sum of squares for each query image, and calculate and store the sum of squares of the compressed image during compression.
d. Traverse the compressed array

multi += 6 * 0
multi += 6 * 4
multi += 6 * 0
e. Euclidean distance = 90 + 108 - 2 * multi
Efficiency: 11 microseconds
This method only calculates 7000 multiplications, and the efficiency is doubled compared to the above method.

Compression Ratio Optimization
Basic off_set compression

method
The basic off_set compression has been mentioned earlier.

Effect
Each picture is compressed to 42k.

example
Picture: [ 5 5 8 0 0 ... 0 7 6 3 0 0 0 0 ]((500 0s between 8 and 7), where 208 is the sum of squares, and 6 is the number of non-zero elements.

off_set optimization

method
The basic method uses int16_t to save the off_set of non-zero data. Each off_set of non-zero value occupies 2 bytes. How to reduce the space occupied by off_set?

Naturally, we think of the difference between the off_set that saves the data and the off_set of the last non-zero data. In this case, the value will be greatly reduced. Generally, it will be less than 255, so we can use uint8_t to save off_set difference.

But what if the difference of off_set is greater than 255?

We saved some 0s as non-zero values. If the off_set between two non-zero values is greater than 255, the position of the first non-zero off_set+255 off_sets stores a non-zero value of 0. .

If the difference between the 0 off_set and the next non-zero off_set is less than 255, save the next non-zero value, otherwise add another non-zero value.

According to statistics, each picture has an average of 2 non-zero values with an off_set difference greater than 255, which basically has no impact on performance.

Effect
If this method is used, the off_set part occupies one byte, and after compression, the image size is: 7k * (1+4) = 35k.

Example:
Image : [5 5 8 0 0 ... 0 7 6 3 0] (500 0s between 8 and 7).

As shown in the figure below, the number of non-zero elements becomes 7. Since the off_set difference between elements 7 and 8 is greater than 255, an additional 0 is added as a non-zero value, so that the off_set difference is less than 255. Has no effect when computing Euclidean distances.

value optimization

method
The value is stored in int32_t. According to statistics, there are 6 values greater than 65535 on average, and uint16_t is used to store the value of value. Values exceeding 65535 are placed at the end of the compressed buff and stored in int32_t.

Effect
After compression, the size of each picture is 7k*(2+1) = 21k.

example
Image: [5 5 8 0 0 0 0 7 67676 66666 0 0 0].

As shown in the figure below, 9024396695 is the sum of squares; 4 is the number of uint16_t values, 2 is the number of values greater than uint16_t, the front non-zero value is stored in uint16_t, which occupies two bytes, and the latter 67676 and 66666 are larger than 65535, stored in int32_t type, occupies 4 bytes, and the average number of values greater than 65535 in each picture is 6, so the space occupation is greatly reduced.

Remove duplicate values

Is there any possibility of further compression?

method
After analyzing the data, it was found that many of the adjacent non-zero values in the picture are the same. After statistics, on average, each picture has 2400 adjacent values ​​that are not the same, and 4300 adjacent values ​​are the same, of which 700 Two adjacent values are the same, and 3600 values are three adjacent values the same.

Example: 0 1 1 1 0 0 3 3 3 2 2 0 0.

If the above 1 1 1 and 3 3 3 and 2 2 are normalized, and only 1, 3, 2 and how many of this value are saved, the space can be further compressed to store how many of each value? Stored in the off_set array, the current off_set is saved with 8 bits, and the maximum off_set difference of 255 is saved. The 8 off_sets are split, and 6 bits are used to save the off_set difference. The maximum saved difference is 63, and the remaining 2 The bit saves the value with several identical values, and the number of identical values that can be stored is: 0 (not used), 1, 2, and 3.

According to statistics, the average number of values whose off_set difference is greater than 63 in each query is 250, that is, more than 200 additional 0 values need to be saved, and the impact on space occupation is not particularly large.

Effect
Compress the picture to 13k by this method, which is approximately equal to: 2.4k 3 + (1.2+0.3)k 3 where 2.4k is the number of adjacent non-repeated values, and 1.2k is the number of adjacent and repeated values of 3 The number, 0.3k is the number of adjacent and repeated values of 2, 3 is a one-byte off_set + 2-byte value.

Off_set difference greater than 63 and value greater than 65535 have little impact on space.

The time-consuming to calculate the Euclidean distance becomes 35 microseconds, and the main reason for the slower efficiency is that the int8_t byte is split into 6bit and 2bit.

example
Image : [0 1 1 1 0 0 3 3 3 2 2 0 0 ]

Among them, 38 means the sum of squares, 3 means that there are 3 values, 0 means that there are 0 values greater than 65535, (1:3) is represented by a uint8_t, the first 6 bits store 1, indicating that off_set is 1, and the last 2 bits are stored 3, means there are 3 same values. (5:3) is similar to the previous one, indicating that the off_set difference is 5, 3 of the same value. (3:2) is similar. The following 1 3 2 are non-zero values, and the difference from the previous method is that there are no duplicate values. Adjacent to the same value, only one is stored.

Remove Duplicate Value Optimization

Although the above method has a high compression ratio, the efficiency is reduced. Can the efficiency be improved?

method

In the above method, the efficiency is reduced because the off_set is split into 6 bits and 2 bits. When calculating the Euclidean distance, it needs to do and operation, which increases the complexity.

When we store, we store no repeated values together, two repeated values ​​exist together, and three repeated values ​​exist together, so that when calculating the Euclidean distance, when traversing to three repeated values, only need It is enough to calculate the value of three multiplications, and there is no need to save the number of repeated values.

The off_set difference of this method has changed, not the off_set difference between non-zero values, but the off_set difference without repeated values, and the off_set difference between repeated 3 values, this off_set will increase After statistics, there are 4 off_set differences between repeated values greater than 255, 25 off_set differences between two repeated values greater than 255, and 5 off_set differences between 3 repeated off_sets greater than 255 . The reason why the 2 repeated values are more than 255 is because the 2 repeated values are relatively small, about 300 values, so the distance between them is relatively far, that is, the off_set difference is relatively large, so we put the 2 repeated values Values are considered as distinct values.

Effect
This method compresses the picture to about 14k. (2.4k + 0.7k) 3 + 1.3k 3, where 2.4k is the sum of non-repeating values, 0.7k is the sum of values with repetition number 2, and 1.3k is the sum of values with repetition number 3 , 3 is off_set byte + value byte, other values have negligible impact on the total space.

Performance: 9 microseconds, which is reduced by 2 microseconds on the basis of 11 microseconds, because when the number of repetitions is a value of 3, there is no need to repeatedly traverse the array. The number of times the array is traversed is not 7000, but 3000 + 1300 = 4300 times.

example
Image: [0 1 1 1 0 0 3 3 3 2 2 0 0 ].

Among them, 38 means the sum of squares, 2 means that there are 2 non-repeated values, the following 2 means that there are 2 values with a repetition number of 3, 0 means that there are 0 values greater than 65535, 9 and 1 represent the off_set difference of non-repeated values Values, 2 and 2 represent non-repeated values, 1 and 5 represent the off_set difference with 3 repetitions, 1 and 3 represent values with 3 repetitions.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us