Recent Advances in Dimensionality Reduction with Provable Guarantees

By Jelani Nelson

The following is a short introduction to an invited lecture to be presented at the upcoming 2018 SIAM Annual Meeting (AN18) in Portland, Ore. from July 9-13. Look for feature articles by other AN18 invited speakers introducing the topics of their talks in future issues.

Jelani Nelson, Harvard University.

Researchers have used dimensionality reduction for over a hundred years — first Francis Galton, then Karl Pearson, and later Harold Hotelling in the late 19th and early 20th centuries. Pearson and Hotelling developed principal component analysis (PCA); the formation of a number of methods, including multidimensional scaling, kernel PCA, and isometric feature mapping, soon followed. One can employ these methods in statistics, data mining, machine learning, and many other areas after approximately fitting high-dimensional data to low-dimensional models by extracting a few variables that “explain” most of the data.

Orthogonal but complementary is the study of dimensionality reduction from the perspective of low-distortion embeddings into low-dimensional normed spaces. Researchers utilize such methods in high-dimensional computational geometry, signal processing, algorithms for large-scale linear algebra problems, identification of motifs in computational biology, machine learning, etc. Algorithms used in high-dimensional computational geometry problems typically suffer from the curse of dimensionality, wherein the running time and/or memory usage of the best-known algorithms scale poorly (even exponentially) with dimension. Such problems include nearest neighbor search and various clustering problems. The idea is that, given some input set of data points \(X = \{x_1,\ldots,x_n\}\subset\mathbb{R}^d\), one can quickly compute some \(f:X\rightarrow\mathbb{R}^m\) for \(m\ll d\), such that \(f(X)\) has roughly the same geometry as \(X\) (e.g., preservation of pairwise distances, angles, volumes of simplices defined by subsets of points, etc.). A cornerstone result in this line of work is the so-called Johnson-Lindenstrauss (JL) lemma, proven in 1984:

Lemma 1 (JL lemma). For any \(X\subset \ell_2\) and any \(0<\varepsilon<1\), there exists an embedding \(f:X\rightarrow\mathbb{R}^m\) with \(m = O(\varepsilon^{-2}\log |X|)\), such that

\[ \forall x,y\in X, \ (1-\varepsilon)\|x-y\|_2 \le \: \|f(x) - f(y)\|_2\le (1+\varepsilon)\|x-y\|_2.\]

That is, the JL lemma preserves pairwise Euclidean distances, and the embedded dimension depends only on the number of points (and only logarithmically) and not on their original dimension! Furthermore, all known proofs of the JL lemma provide an \(f\) that is linear, and even chosen at random from an appropriate probability distribution, oblivious of the actual point set \(X\) to be embedded. Thus, one defines \(f\) as the map \(x\mapsto \Pi x\) for a random matrix \(\Pi\in\mathbb{R}^{m\times d}\), chosen from an appropriate distribution. Both facts have proven useful for a variety of applications.

Since the introduction of the JL lemma and its use in computer science applications beginning in the mid-1990s, a variety of questions concerning the lemma have been asked and recently answered, either optimally or near-optimally. These include the optimal dimension \(m\) that can be achieved by a Euclidean dimensionality-reducing map in terms of \(\varepsilon, |X|\) (i.e., is the \(m\) achieved in the JL lemma optimal?), and whether one can choose the aforementioned \(\Pi\) in such a way that the map \(x\mapsto \Pi x\) can be computed quickly. Researchers have also found new generalizations of the JL lemma and its applications, such as speeding up low-rank approximation computations for the PCA method and other large-scale linear algebra problems, like approximate least squares regression. They have used a generalization of the JL lemma called subspace embeddings—pioneered by Tamás Sarlós in 2006—to achieve these.

My talk at the 2018 SIAM Annual Meeting will touch on recent developments in dimensionality reduction.

Jelani Nelson is an associate professor of computer science and the John L. Loeb Associate Professor of Engineering and Applied Sciences in the John A. Paulson School of Engineering and Applied Sciences at Harvard University.