# Separating Shape and Intensity Variation in Images

One of the main goals of image analysis is to describe variations between images. Understanding variations in a population of images allows researchers to classify new subjects as potential members of the population distribution. Such classification is especially essential in the field of computational anatomy [2], where locating similarities and differences in imaging data for sick and healthy populations is an important task. Recognizing these patterns aids in the detection of sickness or quantification of disease severity in newly-observed subjects. Our recent work presents a novel, flexible class of mixed-effects models that separates variation in images [3]. The framework combines image analysis with theory from functional data analysis, and uses statistical methods to simultaneously estimate the template image and variation effects.

The two largest modes of image variation are *intensity variation* and variation in *point correspondence*. Point correspondence, or warp variation, is the shape variability compared to the template image. Intensity variation is the spatially correlated variation left after compensating for the true warping effect. For example, the spatial intensity variation could describe either systematic error in an image or anatomical variation, such as tissue density or texture.

Intensity variation has previously been considered a nuisance that could be handled by pre-processing images. Hence, the analysis focused primarily on estimating shape variation/warp effects. This approach can be problematic because one does not account for uncertainty of the intensity modifications—made when pre-processing the images—in the subsequent analysis. This underestimates the intensity variation, producing an analysis with overconfident estimates regarding the precision of the estimated warp effects [9]. Several works have considered the intensity variation as an integral part of the model, using statistical methods to simultaneously model intensity and warp variation [1, 5]. Compared to other models, the proposed mixed-effects model distinguishes between systematic intensity variation and independent noise, making it useful for denoising images. This model simultaneously estimates intensity and warp variation by an alternating maximum-likelihood estimation and prediction; as a result, the model chooses the most likely separation of the random effects, based on patterns of variation in the data. It consequently prevents the problem of bias in parameter estimates of the random effects. Figure 1 illustrates the idea behind the model.

**Figure 1.**Fixed and random effects.

**Left.**The template (

*θ:*leftmost) perturbed by random warp (

*θ*∘

*v*: 2nd from left) and warp+spatially correlated intensity (

*θ*∘

*v*+

*x*: 3rd from left), together with independent noise

*ε*(

*y*: 4th from left).

**Right.**The warp field

*v*brings the observation into spatial correspondence, with

*θ*overlayed on the template. Estimation of template and model hyperparameters is conducted simultaneously with prediction of random effects, allowing for separation of the different factors in the nonlinear model. Image courtesy of [3].

Images included in the model are considered spatial functional data from \(\mathbb{R}^2\) to \(\mathbb{R}\). Each observation \(y_i\) is a vector of function values on a regular lattice, with \(m=m_1m_2\) grid points \((s_j,t_k)\), i.e., \(y=(y_i(s_j,t_k))_{j,k}\) for \(j=1,...m_1,\ k=1,...m_2\). The proposed model is defined by

\[y_i(s_j,t_k)=\theta(v_i(s_j,t_k))+x_i(s_j,t_k)+\varepsilon_{ijk}, \tag1\]

where the template image \(\theta: \mathbb{R}^2\rightarrow \mathbb{R}\) is modeled as a fixed effect. The spatially correlated intensity variation \(x_i\) is assumed to arise from a Gaussian field \((x_i(s_j,t_k))_{j,k} \sim \mathcal{N}(0,S)\), while the last term is Gaussian white noise \(\varepsilon_{ijk} \sim \mathcal{N}(0,\sigma^2)\). The model differentiates spatially correlated noise and independent white noise, as both can be present in the observed images. In this model, warping functions \(v_i: \mathbb{R}^2 \rightarrow \mathbb{R}^2\) are assumed to satisfy

\[v_i(s,t)=v(s,t,w_i)=\binom{s}{t}+\varepsilon_{w_i}(s,t) \tag2 \]

for \(\varepsilon_{w_i}: \mathbb{R}^2 \rightarrow \mathbb{R}^2\), denoting a coordinate-wise bilinear spline interpolation of a displacement vector, \(w_i\in \mathbb{R}^{{m}^1_w\times m^2_w \times 2}\), on a lattice spanned by \(s_w \in \mathbb{R}^{m_w^1},\enspace t_w \in \mathbb{R}^{m_w^2}\). The displacement vectors \(w_i\) are modeled as random effects following a normal distribution \(w_i \sim \mathcal{N}(0,C)\).

Parameter estimation in \((1)\) is based on a maximum-likelihood approach. The model contains nonlinear transformations of the random warping functions \(v_i(s_j,t_k)\); hence, a closed-form expression for the likelihood function is unavailable. To overcome this, we iteratively linearize the likelihood function around the predicted warp parameters \(w_i\), which at each step enables the use of methodology from linear mixed-effects models. Linearization of nonlinear mixed-effects models is effective in standard situations and present in many software packages [4, 7, 8]. However, one cannot use these existing implementations with the large sizes of image data and combination of linear and nonlinear random effects in \((1)\). We model the covariance matrix for the spatially correlated intensity effect with a sparse inverse to make the computations feasible. This modeling choice is equivalent to assuming conditional independence between pixels that are far from each other, given all other pixels — often a reasonable assumption.

**Figure 2.**Estimates for the fixed effect

*θ*using different models. The models used to calculate the estimates are as follows:

**2a.**Model assuming no warping effect and Gaussian white noise for the intensity model.

**2b.**2a. with a free warping function based on 16 displacement vectors.

**2c.**2a. with a penalized estimation of warping functions.

**2d.**The full model (1). Image courtesy of [3].

To illustrate the model’s applications, we have analyzed data from 10 facial images of the same person [10] and 50 two-dimensional sagittal magnetic resonance imaging (MRI) slices of brains from the Alzheimer’s Disease Neuroimaging Initiative database [6].

Figure 2 compares an estimated template image of the proposed model to frequently-used models for the face data. Sharpness and representativeness of the estimates increase when going from left to right. Upon determining estimates of the template image \(\theta\), the covariance matrix of the intensity function \(S\), and the covariance matrix for the warping effect \(C\), we can use the parameters to split observations in different variations. Figure 3 depicts an example of a brain observation, split into the different variation effects.

**Figure 3.**Model predictions of a mid-sagittal brain slice (shown on the far right). From left to right: The estimated template for the proposed model, the warped template from the proposed model, the absolute value of the predicted spatially correlated intensity variation from the proposed model, and the full prediction. Image courtesy of [3].

In conclusion, we have presented a class of models that avoids the classical problems of biased estimation of variations caused by sequential preprocessing in image analysis. We achieved this by simultaneously modeling the major modes of random variation in object shape and recorded intensities. This in turn allowed maximum-likelihood-based estimation of parameters, which would have been otherwise manually tuned. The maximum-likelihood-based approach leads to parameter estimates that induce a most likely separation of shape and intensity variation.

**Acknowledgments:** This work was supported by the Centre for Stochastic Geometry and Advanced Bioimaging (CSGB), funded by a grant from the Villum Foundation.

**References**

[1] Allassonniere, S., Durrleman, S., & Kuhn, E. (2015). Bayesian mixed effect atlas estimation with a diffeomorphic deformation model. *SIAM Journal on Imaging Sciences, 8*(3), 1367-1395.

[2] Joshi, S., Davis, B., Jomier, M., & Gerig, G. (2004). Unbiased diffeomorphic atlas construction for computational anatomy. *NeuroImage, 23*, 151-160.

[3] Kühnel, L., Sommer, S., Pai, A., & Raket, L.L. (2017). Most likely separation of intensity and warping effects in image registration. *SIAM Journal on Imaging Sciences, 10*(2), 578-601.

[4] Lindstrom, M.J., & Bates, D.M. (1990). Nonlinear mixed effects models for repeated measures data. *Biometrics, 46*, 673-687.

[5] Ma, J., Miller, M.I., Trouvé, A., & Younes, L. (2008). Bayesian template estimation in computational anatomy. *NeuroImage, 42*, 252-261.

[6] Pai, A., Sommer, S., Sorensen, L., Darkner, S., Sporring, J., & Nielsen, M. (2015). Kernel bundle diffeomorphic image registration using stationary velocity fields and Wendland basis functions. *IEEE Transactions on Medical Imaging, 35*(6), 1369-1380.

[7] Pinheiro, J., & Bates, D. (2006). *Mixed-effects models in S and S-PLUS: Statistics and Computing*. New York, NY: Springer Science & Business Media.

[8] Pinheiro, J., Bates, D., DebRoy, S., & Sarkar, D. (2007). Linear and nonlinear mixed effects models. *R package version, 3*, 57.

[9] Raket, L.L., Sommer, S., & Markussen, B. (2014). A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data. *Pattern Recognition Letters, 38*, 1-7.

[10] Samaria, F.S. (2002). The Database of Faces. AT&T. Retrieved from http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.