# Interpreting Deep Learning: The Machine Learning Rorschach Test?

Theoretical understanding of deep learning is one of the most important tasks facing the statistics and machine learning communities. While multilayer—or deep—neural networks (DNNs) originated as engineering methods and models of biological networks in neuroscience and psychology, they have quickly become a centerpiece of the machine learning toolbox and are simultaneously one of the simplest and most complex methods. DNNs consist of many interconnected nodes that are grouped into layers (see Figure 1a) with stunningly simple operations. The \(n\)^{th} node of the network at a given layer \(i, x_i(n)\) is merely a nonlinear function \(f(\cdot)\) (e.g., a saturating nonlinearity) applied to an affine function of the previous layer

\[x_i(n) = f\left( {w}_i(n){x}_{i-1} + b_i(n) \right),\]

where \({x}_{i-1}\in\mathbb{R}^{N_i}\) represents the previous layer’s node values, \({w}_i(n)\in\mathbb{R}^{N_i}\) are the weights that project onto the \(n\)^{th} node of the current layer, and \(b_i(n)\) is an offset. However, these simple operations introduce complexity due to two factors. First, the sheer number of nodes creates an explosion of parameters \(({w}_i(n)\) and \(b_i(n))\), amplifying the effects of nonlinearities. Moreover, the weights and offsets are learned by optimization of a cost function via iterative methods, such as back-propagation. Despite the resulting complexity, researchers have utilized DNNs to great effect in many important applications.

A “perfect storm” of large, labeled datasets; improved hardware; clever parameter constraints; advancements in optimization algorithms; and more open sharing of stable, reliable code contributed to the relatively recent success of DNNs in machine learning. DNNs originally provided state-of-the-art results in image classification, i.e., the now-classic task of handwritten digit classification that powers devices like ATMs. While DNN applications have since spread to many other areas, their well-publicized success in image classification has encouraged continued work and produced other amazing technologies, such as real-time text translation.

Unfortunately, DNN adoption powered by these successes—combined with the open-source nature of the machine learning community—has outpaced our theoretical understanding. We cannot reliably identify when and why DNNs will make mistakes. Though this does admittedly provide comic relief and fun fodder in research talks about applications like text translation, a single error can be very costly in tasks such as medical imaging. Additionally, DNNs have shown susceptibility to so-called adversarial examples, or data specifically designed to fool a DNN. We can generate such examples with imperceptible deviations from an image, causing the system to misclassify an image that is nearly identical to one that is correctly classified. Adversarial examples in audio applications can also exert control over popular systems like Amazon’s Alexa or Apple’s Siri, allowing malicious access to devices containing personal information. As we utilize DNNs in increasingly sensitive applications, a better understanding of their properties thus becomes imperative.

**Figure 1.**What do you see? We can view deep neural networks (DNNs) in many ways.

**1a.**Stylistic example of a DNN with an input layer (red), output layer (blue), and two hidden layers (green). This is a sample “ink blot” for DNN theory. Figure courtesy of Adam Charles.

**1b.**Example of a normalized ink blot from the Rorschach test. Public domain image.

Early DNN theory employed learning and function approximation theory to analyze quantities like the Vapnik-Chervonenkis dimension. Although such quantities characterize DNN complexity with respect to training data, many important questions pertaining to generalization, expressibility, learning rule efficiency, intuition, and adversarial example susceptibility remain. More recent interpretations begin to address these questions and fall into three main analysis styles. First are methods to understand the explicit mathematical functions of DNNs by demonstrating the ways in which specific combinations of nonlinearities and weights recover well-known functions on the data. The second approach analyzes theoretical capabilities and limitations of the sequence of functions present in all DNNs — again, given assumptions on the nonlinearities and weights. These analyses include quantifications of the data-dependent cost-function landscape. Finally, a third class of techniques focuses on learning algorithms that solve the high-dimensional, nonlinear optimization programs required to fit DNNs, and attempts to characterize the way in which these algorithms interact with specific DNN architectures.

Advances in DNN theory include many different sources of intuition, such as learning theory, sparse signal analysis, physics, chemistry, and psychology. For example, researchers have related the iterative affine-plus-threshold structure to algorithms that find sparse representations of data [3]. A generalization of this result temporally unrolls the algorithmic iterations that solve regularized least-squares optimization programs

\[\arg\min_{{x}} [ \left\|{y} - {A}{x} \|_2^2 + \lambda R({x})\right] \tag1\]

via a proximal projection method that iteratively calculates

\[\hat{{x}}_{t+1} = P_{\lambda}\left(\widehat{{x}}_{t} + {A}^T\left({y} - {A}\widehat{{x}}_{t}) \right)\right), \tag2\]

where \(P_{\lambda}(\bf{z})\) is the nonlinear proximal projection

\[\min\limits_{\beta} \|{z} -{x}\|_2^2 + \lambda R({x}).\]

When the regularization function \(R(\cdot)\) is separable, \(R({z}) = \sum_k R(z_k)\), the proximal projection is a pointwise nonlinearity that mimics DNN architectures. Treating \(\widehat{{\beta}}_{t}\) as different vectors at each algorithmic iteration, these variables can map to the node values at subsequent DNN layers, with weights \({w} ={A}^T{A} + {I}\) between layers, a bias \({b} = {A}^T{y}\), and nonlinearity defined by the proximal projection. This example offers a sense of the intuitions gleaned by mapping the network operations onto well-known algorithms. And this single interpretation is just the tip of the iceberg; a larger, non-exhaustive list of additional explanations is available in [1].

The sheer quantity of recent publications on DNN theory demonstrates just how relentless the search for meaning has become. An interesting pattern begins to emerge in the breadth of possible interpretations. The seemingly limitless approaches are mostly constrained by the lens with which we view the mathematical operations. Physics-based interpretations stem from researchers with a physics background. Connections to sparsity and wavelets come from well-known scientists in those fields. Ultimately, the interpretation of DNNs appears to mimic a type of Rorschach test — a psychological test wherein subjects interpret a series of seemingly ambiguous ink-blots (see Figure 1b). Rorschach tests depend not only on *what* (the result) a subject sees in the ink-blots but also on the *reasoning* (methods used) behind the subject’s perception, thus making the analogy particularly apropos.

On the one hand, these diverse perspectives are unsurprising, given DNNs’ status as arbitrary function approximators. Specific network weights and nonlinearities allow DNNs to easily adapt to various narratives. On the other hand, they are not unique in permitting multiple interpretations. We can likewise view standard, simpler algorithms through many lenses. For example, we can derive the Kalman filter—a time-tested algorithm that tracks a vector over time—from at least three interpretations: the orthogonality principle, Bayesian maximum *a-priori* estimation, and low-rank updates for least-squares optimization. These three derivations allow people with different mathematical mindsets (i.e., linear algebra versus probability theory) to understand the algorithm. Yet compared to DNNs, the Kalman filter is simple; it consists of only a handful of linear-algebraic operations. Its function is completely understood, allowing for validation of each viewpoint despite the different underlying philosophies.

Similar validation for DNN theory requires a convergence of the literature. We must distinguish between universal results that are invariant to the analysis perspective and those that are specific to a particular network configuration. A healthy debate is already underway, with respect to the information bottleneck interpretation of DNNs [4, 5]. We should also work to better understand the interactions between functions that DNNs perform, their mathematical properties, and the impact of optimization methods. Unfortunately, DNN complexity introduces numerous challenges. Many standard tools, such as those that attempt to comprehend a model’s generalization from training data [6] or empirically assess important network features [2], are difficult to apply to DNNs. Luckily, there is no shortage of excitement, and we continue to enhance our understanding of DNNs with time. The community is also beginning to coalesce, and dedicated meetings—like workshops at the Conference on Neural Information Processing Systems and the recent Mathematical Theory of Deep Neural Network symposium at Princeton University—will further accelerate our pace.

**References**

[1] Charles, A.S. (2018). Interpreting deep learning: The machine learning rorschach test? Preprint, *arXiv:1806.00148*.

[2] Ghorbani, A., Abid, A., & Zou, J. (2017). Interpretation of neural networks is fragile. Preprint, *arXiv:1710.10547*.

[3] Papyan, V., Romano, Y., & Elad, M. (2016). Convolutional neural networks analyzed via convolutional sparse coding. *J. Mach. Learn. Res., 18*, 1-52.

[4] Saxe, A.M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B.D., & Cox. D.D. (2018). On the information bottleneck theory of deep learning. In *Sixth International Conference on Learning Representations*. Vancouver, Canada.

[5] Tishby, N. & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. In *2015 IEEE Information Theory Workshop* (pp. 1-5). Jeju, Korea.

[6] Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In *Fifth International Conference on Learning Representations*. Toulon, France.