By Sebastian Bosse, Gitta Kutyniok, and Rafael Reisenhofer
For most image processing and transmission systems the ultimate receiver is a human viewer. Reliably assessing the perceived quality of an image or video is thus a crucial step in the development and evaluation of many technical applications. However, collecting opinion ratings from human observers is a cumbersome, time consuming, and expensive task. A simple way of saving the resources required for such experiments is to apply computational models of image quality which aim at reproducing the human perception of image quality.
Computational models of perceptual image quality can also be used to easily validate, benchmark, and compare image processing methods such as image coding or image restoration algorithms. They also play an important role in quality control. Providers of video streaming services like Netflix or Amazon routinely monitor the quality of the videos delivered to their customers by applying automatic quality assessment methods.
In addition, many image processing systems apply computational models of image quality as metrics to optimize their output. A prominent example is modern video compression, where a reduction of the number of bits used for representing a video also decreases the visual quality of the encoded video. A good encoder has to correctly balance the trade-off between the number of bits spent and the resulting spatial and temporal quality of a video. In this task, a good model of perceptual image quality is imperative for the efficient allocation of bits.
These examples of applications already illustrate the main requirements for a good quality measure: It should (a) provide a high reliability in terms of accurately predicting perceived visual quality and (b) be of sufficiently low computational complexity to be applicable in real-time systems.
It is, however, astonishing how difficult many seemingly easy problems can be for machines. When it comes to assessing the perceptual quality of an image, humans are very fast and surprisingly consistent in their judgment, while the task of computationally predicting this quality judgment is very challenging. Indeed, many simple and popular approaches to image quality assessment, such as the mean squared error (MSE) or the peak signal-to-noise ratio (PSNR), fail at accurately predicting visual quality and similarity.
Providing perceptually accurate image quality assessments is a highly multidisciplinary task and the design of successful image quality measures relies on insights from different fields such as psychophysics, neuroscience, image processing, computer vision, machine learning, and systems design. The domain knowledge leveraged can either be based on explicit models of the human visual system (HVS) or on basic assumptions about some of its general functional properties. Due to the sheer complexity of the HSV and our still poor understanding of many of its subsystems, modern image quality metrics mostly follow the latter approach for identifying image features that drive image quality.
The task of measuring the quality of an image by comparing it to an optimal reference image is called full reference image quality assessment. A prominent framework applied by many full reference image quality metrics is to first compute the similarities of certain local features extracted from both the reference and the distorted image in question. The local values of the resulting similarity maps can then be pooled spatially and across the model features to provide a single scalar quality score for the tested image. The performance of an image quality predictor based on such a concept of feature similarity strongly depends on the choice of features and the applied pooling strategy. While complex features might yield a higher prediction accuracy, high computational complexity may restrict their applicability.
In practice, however, the local features that facilitate highly accurate quality predictions can be surprisingly simple. The features used in the recently introduced Haar Wavelet-Based Perceptual Similarity Index (HaarPSI) are obtained by performing only two stages of a discrete Haar wavelet transform. The Haar wavelet system was already proposed in 1910 by Alfred Haar and is not only the oldest but also the simplest and thereby the most computationally-efficient wavelet transform.
While this approach is highly advantageous from a computational point of view, the HaarPSI is also consistent with elementary functional properties of the human visual system. The filter responses of discrete Haar wavelets are a rough yet efficient model of the frequency and orientation selectivity of the first stages in the visual pathway of the human brain. In a second processing step, the HaarPSI applies a simple logistic function to the values of the resulting local similarity maps, which accounts for saturation effects that can be observed in most neural systems. This mechanism is also prominently utilized for activation functions in artificial neural networks. Visual attention is mostly attracted by objects and object boundaries but less so by textures, which causes (dis)similarities in different regions to have varying impact on the overall perceptual quality of an image. To cope with this, the HaarPSI employs a weighted average strategy to pool from local similarities to obtain a single global similarity index. Again, a very simple averaging strategy proves to be highly effective and the applied weights are simply obtained from the filter responses on the third stage of the discrete Haar wavelet transform.
The resulting image quality prediction is highly consistent with human perception and outperforms most other state-of-the-art approaches while at the same time being computationally very inexpensive. The scatter plots in Figure 1 visualize the quality prediction performance of the HaarPSI on the four popular image quality databases LIVE, TID2008, TID2013 and CISQ.
A comparison of the prediction performances of the HaarPSI and state-of-the-art image quality assessment methods is shown in Table 1. A comparison of computational complexity in terms of execution time of a MATLAB implementation is presented in Table 2.
The HaarPSI of three images distorted by different levels of the JPEG compression algorithm with respect to an undistorted reference image is shown in Figure 2.
More technical and detailed information on the HaarPSI alongside MATLAB and Python implementations can be found here.
 H. R. Sheikh and A. C. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15:430–444, 2006.
 Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Proc., 13(4):600–612, 2004.
 Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. Multi-scale structural similarity fror image quality assessment. In Proceedings of 37th IEEE Asilomar Conference on Signals, Systems and Computers, 2003.
 A. Liu, W. Lin, and M. Narwaria. Image quality assessment based on gradient similarity. IEEE Transactions on Image Processing, 21(4):1500–1512, April 2012.
 Eric Cooper Larson and Damon Michael Chandler. Most apparent distortion: Full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1):011006– 1–011006–21, 2010.
 L. Zhang and H. Li. SR-SIM: A fast and high performance iqa index based on spectral residual. In 2012 19th IEEE International Conference on Image Processing, pages 1473–1476, Sept 2012.
 L. Zhang, Y. Shen, and H. Li. Vsi: A visual saliency-induced index for perceptual image quality assessment. IEEE Transactions on Image Processing, 23(10):4270–4281, Oct 2014.
 Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Proc., 20(8):2378–2386, 2011.
Gitta Kutyniok is a professor for Applied Functional Analysis at TU Berlin. Her research focuses on the areas of applied harmonic analysis, compressed sensing, deep learning, imaging science, inverse problems, and numerical analysis of PDEs. Sebastian Bosse is a research associate in the Machine Learning group at HHI Fraunhofer, Berlin, Germany. His research interests include perceptual multimedia quality, video compression, and machine learning. Rafael Reisenhofer is a Ph.D. student in the Department of Mathematics at the University of Bremen. His research interests include applied harmonic analysis, mathematical image processing, and computer vision.