Machine-Learning algorithms can be used to identify key areas of an input image, e.g. “clothes”, “skin”, and “background”. With that information it’s possible to process each area independently, to produce images that provide better visual results under compression and encoding.
For optimizing fashion product images, this could for example be used to identify cloth and skin texture, so that we can dedicate more bits to those areas than to the background. The goal? Smaller images with better quality.
Figure 1: Identifying key areas to improve compression output. Input image with three identified key areas: clothes and apparel (red), skin (blue), and background (black).
Broadly, computers represent images in two ways: either as vectors or as bitmaps. Clothing product images are generally of the later type: bitmaps, so for this exposition we focus on those only. For the computer, a bitmap image is a bi-dimensional array of pixels. Image compression is the process of reproducing a very similar copy of a master image in less bits. It works by doing one or both of two things. The first is exploiting redundancy in the input image. Pixels in an image tend to represent similar colors and values, which can be used to encode the image using fewer bits.
The second thing compression can do is to exploit peculiarities of the human visual system. It turns out humans do not see in pixels (unless we zoom an image a lot, which specialists on image processing do often), but in colors, light intensities and features. Image compression can result in an image which is not entirely faithful to the original, but whose defects the human eyes and brain don’t notice too much. Table 1 shows how different image compression formats exploit both things. Note that all formats exploit redundancy, but that lossless image formats do not try to trick the brain by “slipping in” artifacts that the human brain won’t notice.
Table 1: Image formats and what they are based on.
|Peculiarities of the human visual system|
|Redundancy in input image||No||???||???|
|Yes||Lossless image formats: PNG, Lossless WebP, FLIF...||JPEG, JPEG 2000, Lossy WebP, AVIF ...|
A third factor that influences image compression is processing time and complexity. Simply put, a compression/decompression operation must run quickly, often in the time scales relevant for in-browser rendering or on-the-flight encoding when serving a web page. Rather than settling for the best, algorithms must often make “practical choices” which are suitable to the broadest variety of use cases.
Tuning the algorithms to compress clothing pictures better
What if instead of relying in the broad assumptions that algorithms like WebP compression do, we could rely in specifics for, say, compressing clothing pictures?
To understand the potential in this technique, we show below a concept experiment with Gimp, a free program. You can download Gimp to repeat these experiments yourself with any image of your choosing. But before we get there, we must warn this is a proof of concept, nowhere near the output and the requisites a production system would be subject to. Consider it a fun way to see the concepts at play with readily available tools. After the experiment, we explain some of the adjustments that a real-world, implementation-level implementation would need.
The idea is the following: what happens when an image is applied Gaussian blur and then compressed? Figure 2 shows an image with two different backgrounds, and their size under lossless compression. The first background is a rug, the second background is similar to the backgrounds are directors approve for use in their websites.
Figure 2: Image background and impact in compression cost. Panel (a) represents an image with a non-trivial background. Panel (b) shows the same image with Gaussian blur applied. Panel (c) is a composition of (a) and (b) where only the background has been blurred. Panel (d), (e) and (f) mirror (a), (b) and (c) but with a different background.
Table 2: Size in kilobytes of the images in Figure 2, compressed with both PNG and WebP lossless. Both compression formats should by definition be able to reproduce the same image, pixel by pixel that they are provided as input, therefore the sizes in this tables reflect the power of the compression algorithms. For downloading the images in webp-format, click here.
|Image||Lossless PNG size (kilobytes)||Lossless WebP size (kilobytes)|
|Input image: (a) Girl with rug as background, (d) Girl with photographic background||548 (a) 259 (d)||363 (a) 159 (d)|
|Experiment 1: full blur, panes (b) and (e)||276 (b) 143 (e)||189 (b) 97 (e)|
|Experiment 2: blur in background only, panes (c) and (f)||280 (c) 193 (f)||185 (c) 126 (f)|
In both cases, Gaussian blur reduces the amount of information in an image. That loss of information is plainly visible in panels (b), (c), and (e) of the figure. Unsurprisingly, all the images with blur compress to lower sizes with lossless, as shown in Table 2. The gains in size are significant, but while the top sequence of panels make the background blurring clearly visible if one compares (a) vs (c), there is practically no visual difference in (d) vs (f), even if the resulting image is still 23% smaller.
Can one use these results for clothing products? Yes, blurring the backgrounds does not have a visual impact in some images, as the second sequence of images shows. Better, even for non-trivial backgrounds like the one in the first image sequence, we are convinced that smart trade-offs are possible.
Another way to look at these results it is that blurring reduces the amount of “surprise” that compression algorithms find in the image (information entropy). Since the conventional compression algorithms are tuned for general images, things like clothing texture and photographic pixel noise are treated equally. However, one can improve the results just by separating both.
Challenges and machine learning
Machine learning help us solve many practical challenges with the approach summarized above:
- We need robust, automated ways to find the parts of the image which are interesting for an e-commerce operator, i.e. cloth and skin texture, so that we can dedicate more bits to those areas than to the background.
- In the toy example above, we gave the results using lossless compression algorithms. But in practice clothing product images are compressed with lossy formats, where one needs to control by visual distortion, and unsurprisingly, humans perceive distortion in clothing and skin textures different than they perceive distortion in background noise. Machine learning help us replicate human perception for grading image quality.
- Using Gaussian blur, as we did in the toy example, is not necessarily optimal. Machine learning help us find filters that perform optimally under certain compression algorithms.