Big Data Techniques Yield More Accurate Reconstructions of the Human Face

By Lina Sorg

BIGMATH is an EU-funded Ph.D. program for early-career mathematicians with the theoretical and practical skills necessary to address the computational challenges that accompany the big-data era. Popular research topics include optimization, statistics, and large-scale linear algebra, all of which are relevant to machine learning techniques and the effective creation of data-driven products. During a scientific session at the 9th International Congress on Industrial and Applied Mathematics, which took place this week in Valencia, Spain, Filipa Valdeira, Stevo Racković, and Rongjiao Ji presented projects that utilized big data techniques to reconstruct the human face. Applications ranged from medical imaging to digital animation.

Valdeira opened the session with a brief discussion about the increased availability of three-dimensional (3D) data in recent years. “In particular, it’s really relevant to be able to develop tools for the human body and face,” she said. 3D facial data finds use in surveillance efforts, identity recognition, human-machine interaction, and the medical field. The latter is especially significant, as doctors and researchers use 3D data for organ shape segmentation, prosthesis design, and surgical planning. Valdeira utilizes 3D scans and point clouds of the human face to create a parametric surface model to help with prostheses of detailed facial features like ears and noses.

An animator can obtain different expressions by adjusting the 5,000-6,000 vertices that comprise the human face.

A primary challenge of Valdeira’s project is registration, as two different point clouds can respond to the same shape. “We have to determine which points have the same meaning in one shape and which have the same meaning in another shape,” she said. Most situations exploit previously-built models, but building one’s own model depends on the ability to register first. Additionally, problems pertaining to missing data, noisy data, and outliers complicate the modeling process, as high-quality data is often difficult to obtain from scans.

Valdeira considered a general human face shape for the purposes of her model. Models of the human face are nearly always accompanied by considerable inter- and intra-person variability. Inter-person variability refers to the faces or identities of different individuals, while intra-person variability occurs when one person changes his/her facial expressions in response to certain circumstances. Although most studies of this nature currently rely on principal component analysis, Valdeira’s shape space is a nonlinear, non-Euclidean manifold. Therefore, she aims to develop tools that are robust, computationally efficient, and more adequate for work on non-Euclidean space.

Racković’s project was slightly different in nature and centered on the use of distributed optimization to obtain real-time motion capture in animation. He focuses primarily on skinning: the animated application of a skin texture over a skeleton to represent the appearance of a character. The skeleton shows all possible movements of a character’s body, and thus requires coverage by a skin that can deform in a way that both looks realistic and defines the appearance of specific characters. Facilitation of this deformation requires creation of a natural (resting) expression that in turn lends itself to a number of other facial expressions. Racković began by scanning a subject’s face in a normal pose as well as in a range of diverse countenances. Alternatively, an animator can obtain different expressions by adjusting the 5,000-6,000 vertices that comprise the surface of a human face.

Motion capture can easily capture animation techniques and match an actor’s face to that of his corresponding animated character.

One can demonstrate each expression in the form of a metric that depicts points in a space. Racković manipulated these points to average the existing expression and produce a new one. For example, maneuvering the expression of a fully open or closed mouth can generate a new expression with a half-closed mouth — a compromise of the two. Racković then entered all the possible expressions—including the original, neutral face—into a single matrix. “For each different combination, the resulting shape is unique,” he said. “We can get any physical expression of the face just by changing the parameters.”

Another technique involves creating a displacement matrix that is similar to the aforementioned matrix. However, in this instance the neutral pose would exist as a vector of zeroes and could thus be excluded. This approach ultimately proves equivalent to the previous one. Racković also mentioned another method that would not adjust the weights of the face but instead involves animators, which are expensive and timely. Motion capture functions as an effective alternative, in that it can more easily capture the animation and—in the case of films—can also match an actor’s face to that of his corresponding animated character. This approach requires that an expert select the relevant points on a virtual character’s face to create realistic transitions from one character to another.

In contrast, Ji’s work pertained to mathematical morphology for the prediction of facial expression transitions. Generation of a neutral human face requires a nonlinear transformation in 3D space to act as a controller. One can scale this controller from 0 to 1; a value of 0 indicates complete disengagement while a value of 1 indicates full engagement. A neutral face therefore has a value of 0, while a face with a dropped jaw and open mouth is classified as a 1. Because mouth movement depends on a range of facial muscles, one must build a group of vertices surrounding the jaw. The same is true of the eyes, in that associated vertices control their motion.

It is also possible to build new controllers based on existing ones. For instance, building a controller based on smell would require an overlap of movement of the mouth and both the left and right eyes. Ji used four-dimensional data scanned from real human facial expressions, as well as changing values of engagement degrees for 180 high-level controllers, to yield different curves based on semantic controllers’ engagement levels. Ultimately, the goal of her project is to employ classification methods and realistic descriptions of emotions to generate a stochastic geometric model based on the transition of facial expressions.

Valdeira, Racković, and Ji all began their projects with data in the form of images scanned from a real human face in three dimensions. They used this data to generate accurate descriptions of the human face while refining existing mathematical tools to improve facial reconstruction efforts for a range of applications.

Lina Sorg is the associate editor of SIAM News.