Deep Learning Improves Image Reconstruction in Single-Pixel Cameras

By Lina Sorg

Single-pixel imaging has garnered much attention in recent years due to its ability to produce images at non-visible wavelengths and in low-light conditions. Single-pixel cameras obtain an image via a single-pixel detector with a digital mirror device. The camera directs light from the image subject through the lens, where it reflects on the single-pixel detector. Then the digital mirror device—comprised of an array of 1024x768 tiny mirrors—filters patterns; each pattern results in one real measurement. The mirrors measure the inner product between pixel-sampled input and a set of binary masks, and filtering the scene with masks yields a vector of measurements. However, single-pixel cameras are plagued by an undersampling problem, in that users are restricted by the number of patterns and the breadth of information they can obtain. This limitation results in low reconstruction quality and lengthy data acquisition time, making single-pixel reconstruction a classic inverse problem.

During a minisymposium at the 2017 SIAM Annual Meeting, currently taking place in Pittsburgh, Pa., Catherine Higham of the University of Glasgow used deep learning to address computational issues of image reconstruction with a single-pixel camera. Higham spoke fondly of her personal single-pixel camera, which works in a physical light range. However, she noted that some can also sense gas or even see through things, meaning that many applications fall outside the physical range.

Single-pixel cameras obtain an image via a single-pixel detector with a digital mirror device. Image credit: Catherine Higham, AN17 presentation.

Higham introduced the Hadamard basis, which examines the signal-to-noise ratio (SNR)—the level of desired signal compared to the background noise—as a function of pixel resolution. Unfortunately, an image deteriorates with every fourfold increase in resolution under this method. Another approach involved subsampling the Hadamard basis, which includes looking at the average signal over a large database. These techniques served as comparison points to deep learning.

Higham then explained the process of deep learning in single-pixel cameras. Deep learning utilizes a convolutional autoencoder, which takes input and tries to replicate it as output; the goal is to reconstruct an image from a signal while also considering measurement basis. The first layers mimic the camera while subsequent layers attempt to take measurements and reconstruct the image from those measurements. “It’s basically a series of maps that map you from one feature to another set of features,” Higham said. “The maps that we have here are fully connected, in that there’s a weight associated with each map and a weight associated with each pixel. These types of maps are convolutional maps that pass over the whole image, share the weight, try to provide context in the reconstruction.”

Upon designing the deep learning net to mimic a camera and reconstruct an image, optimizing weights of all different mappings becomes the next task. Higham trained the net on a large dataset of images and used the network in a feed-forward sense to pass over each image. After estimating the cost, she backpropogated the error in cost, adjusted the weights, and repeated the process multiple times to achieve optimization. “The novelty is that we actually require these weights in the first coding layer to be binary,” Higham said. These actions optimize a camera’s measurement basis. Accounting for additive noise and incorporating batch normalization improves the technique.

After outlining the deep learning approach to image reconstruction, Higham compared its effectiveness with L1 compressed sensing and the aforementioned fast Hadamard transform. She explained that the comparison was based on 100 simulated images that were chosen at random and not involved in the original training. Higham aimed to keep the peak SNR rate similar to the structural index, while also examining standard deviation and reconstruction rate at three different resolutions. While there was minimal difference between the three techniques at low resolutions, deep learning offered vast improvements to higher levels of compression. “In terms of reconstruction, deep learning achieves video rates and also offers some improvement in the quality of the image,” Higham said. “That wasn’t possible with compressed sensing.”

When compared with L1 compressed sensing and fast Hadamard transform, deep learning techniques were able to better improve image quality and achieve video rates. Image credit: Catherine Higham, AN17 presentation.

Following this comparison, Higham evaluated real data collected from an actual camera. She began with an original image with over 16,000 Hadamard patterns. That application of the Hadamard basis yielded a blocky, pixely result, as did use of a Gaussian filter. Once again, deep learning generated the cleanest reconstructed image. Next, she showed the audience two video reconstructions – one with Hadamard basis and one with deep learning. The blockiness is especially prominent in the video, while facial features were much more discernable with the learned method.

Ultimately, Higham’s analysis demonstrated that one can obtain video rates from a single-pixel camera using deep learning techniques. “This work represents a significant step towards real-time applications of computational imagers and opens up possibilities for task-specific adaptation,” Higham said. Future applications include gas sensing, 3D imaging, and metrology.

Acknowledgments: This work is funded by an EPSRC UK Quantum Technology Programme grant to the QuantIC project, led by the University of Glasgow.

Click here for more coverage of the 2017 SIAM Annual Meeting.

Lina Sorg is the associate editor of SIAM News.