SIAM News Blog
SIAM News
Print

Computational and Mobile Photography: The History of the Smartphone Camera

By Lina Sorg

In 2017, 85 percent of the world’s photos were captured by smartphones. From an economic standpoint, the trillion-dollar smartphone economy has drastically altered people’s lives, habits, and behaviors in recent years. “The smartphone has really changed the world," Peyman Milanfar of Google Research said. “It’s really more of a camera than a phone nowadays. It has also fundamentally changed the way we interact with the world.” During a minitutorial at the virtual 2020 SIAM Conference on Imaging Science, which took place this July, Milanfar detailed the history of the smartphone camera from a computational imaging perspective.

Figure 1. The disparity between crowds at Saint Peter’s Square in Vatican City when Pope Benedict XVI was announced in 2005 and when Pope Francis was announced in 2013. Image courtesy of NBC News.
Unsurprisingly, smartphone technology has negatively impacted the standalone camera industry. That market peaked in 2010 and has collapsed nearly 85 percent since then (see Figure 1). To document these rapid changes, Milanfar displayed two contrasting photos of the crowds at Saint Peter’s Square in Vatican City when Pope Benedict XVI was announced in 2005, and when Pope Francis was announced in 2013 (see Figure 2). Only one phone is visible amongst the crowd in 2005, whereas countless phone screens illuminate the 2013 photograph. “This is a fundamental transformation in how we capture and remember our world,” Milanfar said.

Milanfar began with a brief overview of the history of photography (see Figure 3). The era of analog (film) photography—which followed the “old school” photography in the 1800s—began in the 1930s. Analog film dominated the scene for nearly 50 years and was largely discontinued in 2005.

In 1975, Steven Sasson created the first digital camera prototype. It consisted of a camera casing on top, a computer underneath that controlled the camera, and a cassette tape that recorded the photograph. The resolution of these cameras, which needed about 20 seconds to shoot a single picture, was very low (100x100). To see the photo, the photographer had to remove the cassette tape, insert it into a microcomputer, and view the image on a television screen. When Sasson presented the prototype to Kodak executives, they dismissed his efforts and told him that no one would ever want to look at pictures on a screen; 35 years later, he received the 2009 National Medal of Technology and Innovation for the invention of the first digital camera.

The first commercial digital cameras appeared in 1990. The Logitech Fotoman had a resolution of 376x284, took black-and-white photos, and cost $1,000. In 1992, the first digital single-lens reflex (DSLR) cameras entered the market; they cost up to $20,000 and included a tethered external hard drive that users had to carry over their shoulders. Unsurprisingly, neither of these models sold well.

Figure 2. The impact of smartphones on the camera industry. Image courtesy of the Camera and Imaging Products Association.

The introduction of the complementary metal oxide semiconductor (CMOS) image sensors in 1993 led to the so-called “camera on a chip.” This revolutionary sensor has led to cameras that have a rolling shutter readout and are far cheaper and more power efficient. But at the time, it was also noisier and suffered from several other technical hang-ups.  “Some of these technical difficulties meant that it would take another 10 years before CMOS systems would enable mass production of mobile and digital cameras,” Milanfar said.

In 1997, Kodak patented the idea of an electronic camera with a programmable transmission capability; this imagined camera looked much like a mobile phone from the early 2000s. However, Kodak did not actively pursue this technology out of concern that it would harm its own analog photography business. As a result, other corporations took it over and the first commercial camera phone materialized in Japan in 2000. Sharp’s J-Phone cost $500 but took low-quality photos. Because traditional cameras cost roughly the same amount and captured better pictures, the J-Phone did not sell well.

In the early 2000s, digital single-lens reflex (SLR) and charge-coupled device (CCD) cameras became available. The Canon PowerShot point-and-shoot camera had a 1.5-megapixel resolution and cost approximately $500 in 2000. The Nikon D1 appeared that same year with a 2- or 3-megapixel resolution and a price tag of $3-5,000. 20 years later, the prices of these two camera types are relatively the same — which makes the current price per pixel a much better value for the consumer.  This improvement is largely due to better sensors and optics, and a much lesser degree to technical advances in software. Even today, standalone cameras still (somewhat surprisingly) have relatively naïve software-based pipelines. Algorithmic and software innovations have thrived in mobile devices, restricted by sleek form-factors.

The introduction of the iPhone in 2007 changed the course of phone and camera technology. Apple had reinvented the mobile phone’s display and user interface, but not necessarily the camera; in fact, the two-megapixel camera on the first iPhone was nearly an afterthought. At this point, mobile cameras were not yet commonplace or of high enough quality to compete with traditional cameras.

Figure 3. The history of digital photography from 1975-present. Image courtesy of Peyman Milanfar.

In 2010, a transition to both 4G and 300 dots per inch (dpi) displays enabled users to enjoy photographs on the screens of their mobile devices. “People felt that their phone screens were rich enough for the consumption of photos,” Milanfar said. “2010 was a really critical point in time where this transition happened from small devices with bad cameras and small screens to larger devices with very good cameras and very high-resolution screens.” A faster wireless network speed also meant that individuals could view and share their photos much more easily. As a result, manufacturers began to focus on increased light collection, better dynamic range, and higher resolution for camera phones so consumers did not have to carry multiple devices.

Mobile cameras face huge physical limitations. For instance, a typical smartphone gets several hundred times less light than traditional cameras because of the aperture size. The use of flash is limited to a range of only several meters. Bracketed pairs of exposures can introduce motion blur and are difficult to align. In response, the industry turned to burst photography as a solution. “Burst photography has become the workhorse of mobile computational photography in the last few years,” Milanfar said. This technique simultaneously captures a burst of five, 10, or even 15 photos and merges the shots together to yield one high-quality image. A control algorithm decides how to slice total exposure time into smaller components. The subsequent step involves aligning and merging the images together into a single frame; researchers then perform single-frame image processing to enhance and refine the results.

Figure 4. Pixel shifting can improve both color reproduction and pixel resolution. Image courtesy of Peyman Milanfar.
The process of demosaicing— a fundamental step in single-frame imaging pipelines—“kills” details and produces visual artifacts. In fact, two-thirds of the colors in every photo are imagined (estimated) because each pixel inherently measures only one color at a time. Demosaicing fills in the missing colors. A method called pixel shifting, which engages multi-frame capture, can simultaneously improve color reproduction and pixel resolution (see Figure 4). This process begins with an image and shifts the sensor to the right by exactly one pixel, which introduces new color pixels that were previously inaccessible. The sensor next shifts down by one pixel, then down and to the right by one pixel, thus completing a sweep of the missing pixels and filling in the respective values.

Unfortunately, engaging pixel shifting on a mobile device is not straightforward because it is impractical for one to physically move the sensor with sufficient precision to precisely sweep all colors into their correct locations. Instead, the only consistent source of motion in mobile photography is the random motion of the handheld device itself. As such, measurements of color are not consistent across the image sensor’s canvas; some pixel positions get none, some get only one, and others may see multiple color samples. These variations yield an overall multi-dimensional, non-uniform interpolation problem.

Multiple sources of motion are present in mobile imaging, including hand tremors that move the camera and kinetic items within the scene. The optical image stabilizer, which exists in nearly all cameras, does not correct subtle motions that stem from natural physiological tremors (i.e., shaky hands). However, a clever feature in Google's Pixel devices uses the stabilizer to "simulate" motion when the device is immobilized (i.e., in a tripod).

Next, Milanfar spoke about non-uniform coverage and a nonparametric technique called nonlinear kernel regression. This method assigns Kernel functions, which are adapted in size and shape based on the measurements and colors in the near vicinity. “Kernels adapt their shape to what might be the underlying structure,” Milanfar said. Upon achieving continuous interpolation, researchers can resample on any grid; they can take images at a certain resolution and merge them onto the same native resolution as the sensor or a higher resolution grid. The limits of this process depend on noise, the amount of available aliasing, and pixel/lens spot size tradeoffs. For typical mobile sensors, the limit to super resolution is roughly 2X.

One can implement all of these image processing procedures via a gather process that aligns each image with a grid, estimates its contribution, and accumulates the samples into a final image. This process is repeated for every input.

Milanfar offered examples with multiframe demosaic/super-resolution techniques that extracted previously-absent details via the handheld multi-frame super resolution algorithm. He examined several use cases of this algorithm. One such implementation is a night setting that takes very long exposures by capturing multiple frames and merging them together to illuminate dark scenes. Merge stability and robustness avoid motion blur and the visual artifacts that are typically present with extremely long exposures. Milanfar then explored a zoom use case in combination with the Rapid and Accurate Image Super-Resolution (RASIR), an upscaling algorithm based on a machine learning network that learns pixel-wise upscaling filters from pairs of high- and low- resolution images. This method is applied to crops of the output that are generated by the multiframe demosaic/super-resolution algorithm to achieve higher zoom factors and produce high-quality enlarged versions of base-resolution photos.

Milanfar concluded with a discussion of other computational imaging challenges, including curation. Although people now take more photos than ever (often of the same subject), typically only a few turn out well. In response, he and his colleagues have developed neural network models that are trained on many images, which humans rated according to aesthetic and technical quality. The model ranks various images of the same scene based on aesthetics. Of course, this is an aggregate and not necessarily meant to cater to personal taste. The same model can also rank photos based on their technical quality, such as those in which the subject is well-lit, focused, and centered.

Ultimately, the technology behind computational photography—the science and engineering techniques that generate high-quality images from small mobile cameras—has advanced significantly over the past decade and will likely continue to progress in the future.


The slides from Peyman Milanfar’s presentation are freely available online.

 Lina Sorg is the managing editor of SIAM News.
blog comments powered by Disqus