By Tamara G. Kolda
The problem occurs when deciding how to handle data that is not two-way. Imagine the previous scenario (object and features), except that now we take repeated measurements in time; this becomes a three-way data array. We could average the features over time or just stack them all together, but these solutions tend to be unsatisfactory because understanding the relationships influenced by the time dimension becomes impossible. Using tensor decompositions, an extension of PCA to higher-order data, is the alternative. In fact, we need not stop at three-way data, but can handle four-way, five-way, and much higher as well. There are a variety of tensor decompositions, but I focus on canonical polyadic (CP) tensor decomposition, introduced by Frank Lauren Hitchcock in 1927 but nearly lost to history until its rediscovery in 1970 under the name CANDECOMP by J. Douglas Carroll and Jih-Jie Chang and the name PARAFAC by Richard A. Harshman.
During the SIAM Invited Address at the Joint Mathematics Meetings, to be held in San Diego, Calif., from January 10-13, 2018, I will demonstrate the wide-ranging utility of the CP tensor decomposition with examples in neuroscience and chemical detection.
Of course, multiway data usually means more data, which leads to not only more insights but also more problems. One such problem is the possession of too much data to process efficiently. The use of randomized methods offers a solution to this dilemma. I will present a novel randomized method (based on matrix sketching) for fitting the CP decomposition to dense data that is more scalable and robust than standard techniques. I will further consider the modeling assumptions for fitting tensor decompositions to data and explain alternative strategies for various statistical scenarios, resulting in a generalized CP tensor decomposition that we can fit using a different randomized method (based on stochastic gradient descent).