| October 27, 2016

Exemplar-Based Texture Synthesis: Are Neural Networks the Solution?

High-budget film sets and computer games often rely on computer-generated scenes. Yet the design and visual imitation of objects requires extensive hands-on effort. For this reason, researchers are exploring a more indirect technique of emulating reality: exemplar-based synthesis. This technique uses learning algorithms to generate infinite variants of an image from a number of real examples.

Investigations of exemplar-based synthesis focus on textures. While a wooden floor, foliage, fabric, marble, gravel, sand, a brick wall, a page of text, fur, a box full of cherries, and a crowd of people are all complex 3D objects, they often appear to the eye as a flat, homogeneous surface: a texture. In fact, any repetitive object appears as a texture if seen from a certain distance.

Rather than attempting to provide an exact physical model of the 3D objects they represent, computer graphics simply create a convincing visual illusion of each texture. The conviction that there might be a simple and universal mathematical texture model relates to Béla Julesz’s conjecture that two textures with the same \(N\)-dimensional spatial statistics would be visually indistinguishable. \(N=2\) was proposed and works well on, say, sand or fabric. Yet \(N=2\) is far from sufficient for more complex textures.

The breakthrough in exemplar-based texture synthesis is arguably attributable to Javier Portilla and Eero P. Simoncelli’s universal texture sampler [3]. By enforcing about 700 statistical parameters based on textural example, their texture sampler often achieves satisfying texture synthesis (see Figure 1). But it also fails on many accounts, delivering blurry and jammed results.

Figure 1. Original texture and its imitation using the Portilla-Simoncelli model, based on the estimation of 700 texture parameters.

A straightforward and often impressive competitor is Alexei Efros and William Freeman’s texture sampler [1]. It builds new textures by cleverly copying and pasting small blocks of target texture samples (see Figure 2). The results can be striking. However this Frankenstein-like method often creates conspicuous verbatim copies. Even worse, the generated textures do not respect even simple statistics of the target texture sample, which contradicts Juleszs’ principle and is often quite visible.

Figure 2. Original texture and its Efros-Freeman emulation, obtained by rearranging small image patches of the original. Small artefacts are visible at junctions between patches.

The newest trend is to employ convolutional neural networks (CNNs). The approach of Leon Gatys, Alexander Ecker, and Matthias Bethge [2] uses a CNN trained on a huge image database. The algorithm picks statistics from the outputs of several low-level layers of the CNN on the target texture sample. Starting from random noise, a new texture is generated by forcing the new image to have similar statistics for its own CNN features.

As the authors acknowledge, their method derives directly from Portilla and Simoncelli‘s approach. The difference is that the enforced statistics are no longer hand-crafted, but instead learned by a CNN. The final result can be quite impressive (see Figure 3).

Figure 3. Original texture and its remarkable emulation by Gatys, Ecker, and Bethge; it involves as many parameters as the image itself.

However, it should be noted that the number of statistics from the CNN is on the order of hundreds of thousands, which is often greater than the number of pixels of the target texture sample. Such statistics are likely to be highly redundant, since the only image with exactly the same statistics as the original might be the original itself. If the synthesized image is different, this may be the result of optimization being stuck in a local minimum.

From the viewpoint of texture perception theory, the remarkable results of the Gatys, Ecker, and Bethge method can be deemed a Pyrrhic victory. Indeed, the model has more parameters than the original object! The method is also too calculation-heavy for everyday computer graphics. As a result, recent investigations on neural networks have focused on reducing the size of these computational mastodons. In Figure 4, the three methods on a brick wall have been compared. Each fails in its own way, which indicates that the chase for good texture samplers is far from over.

Figure 4. A brick wall and its automatic imitations by Efros-Freeman, Portilla-Simoncelli and Gatys, Ecker, and Bethge respectively. Each fails in its own way.

References
[1] Efros, A.A., & Freeman, W.T. (2001). Image Quilting for Texture Synthesis and Transfer. In Proceedings of the 28th annual conference on Computer Graphics and Interactive Techniques (pp. 341-346). Association for Computing Machinery.

[2] Gatys, L., Ecker, A.S., & Bethge, M. (2015). Texture Synthesis Using Convolutional Neural Networks. In Advances in Neural Information Processing Systems (pp. 262-270).

[3] Portilla, J., & Simoncelli, E.P. (2000). A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision, 40(1),49-70.

	Axel Davy is a Ph.D. student in applied mathematics at Ecole Normale Supérieure de Cachan. His interests are in background modelling and anomaly detection.
	Jean-Michel Morel is currently a professor of mathematics at Ecole Normale Supérieure de Cachan, Université Paris-Saclay and a visiting professor at Duke University. He leads a team of 20 researchers on the mathematical analysis of image processing and founded Image Processing On Line in 2011, the first journal publishing reproducible algorithms, software, and online executable articles.