Tensor Decomposition: A Mathematical Tool for Data Analysis

Tamara G. Kolda, Sandia National Laboratories.

Data analysis requires a variety of tools, and one of the foundational tools for unsupervised analysis and data reduction is principal component analysis (PCA). Researchers use PCA to analyze data matrices where the rows correspond to objects and the columns correspond to features, so that the \(i,j\) entry is the measure of feature \(j\) for object \(i\). After some preprocessing, PCA then reduces to apply the singular value decomposition to the matrix. We can use the resulting matrix factors for data interpretation, unsupervised learning, dimensionality reduction, completion of missing entries, and more.

The problem occurs when deciding how to handle data that is not two-way. Imagine the previous scenario (object and features), except that now we take repeated measurements in time; this becomes a three-way data array. We could average the features over time or just stack them all together, but these solutions tend to be unsatisfactory because understanding the relationships influenced by the time dimension becomes impossible. Using tensor decompositions, an extension of PCA to higher-order data, is the alternative. In fact, we need not stop at three-way data, but can handle four-way, five-way, and much higher as well. There are a variety of tensor decompositions, but I focus on canonical polyadic (CP) tensor decomposition, introduced by Frank Lauren Hitchcock in 1927 but nearly lost to history until its rediscovery in 1970 under the name CANDECOMP by J. Douglas Carroll and Jih-Jie Chang and the name PARAFAC by Richard A. Harshman.

During the SIAM Invited Address at the Joint Mathematics Meetings, Tamara Kolda will discuss the use of tensor decomposition, an extension of principal component analysis, to examine data.

During the SIAM Invited Address at the Joint Mathematics Meetings, to be held in San Diego, Calif., from January 10-13, 2018, I will demonstrate the wide-ranging utility of the CP tensor decomposition with examples in neuroscience and chemical detection.

Of course, multiway data usually means more data, which leads to not only more insights but also more problems. One such problem is the possession of too much data to process efficiently. The use of randomized methods offers a solution to this dilemma. I will present a novel randomized method (based on matrix sketching) for fitting the CP decomposition to dense data that is more scalable and robust than standard techniques. I will further consider the modeling assumptions for fitting tensor decompositions to data and explain alternative strategies for various statistical scenarios, resulting in a generalized CP tensor decomposition that we can fit using a different randomized method (based on stochastic gradient descent).

Tamara G. Kolda serves on the SIAM Board of Trustees and is a Distinguished Member of the Technical Staff at Sandia National Laboratories in Livermore, CA.