# An Algebraic Geometry Perspective on Topological Data Analysis

Topological data analysis (TDA) is a success story with a wide range of diverse applications. Here I will survey TDA from the point of view of *algebraic geometry*.

Algebra and algebraic geometry present certain immediate applications to TDA. The persistence module—the data structure behind *persistent homology* (PH)—is an inherently algebraic concept, and attempts to extend PH to multiple parameters utilize concepts from commutative algebra [4,10]. However, I would like to discuss some of algebraic geometry’s other roles in TDA — specifically applications of numerical algebraic geometry (NAG) and enumerative algebraic geometry (EAG).

NAG concerns the computation of numerical solutions to a system of \(n\) polynomial equations \(F(x)=(f_1(x),\ldots, f_n(x))=0\) in \(n\) variables \(x=(x_1,\ldots,x_n)\) over the complex numbers. Numerical homotopy continuation is the computational paradigm in NAG. This paradigm involves the generation of a system of equations \(G(x)\) whose solutions are known (a so-called *start system*), and the continuation of the solutions of \(G(x)=0\) along a deformation of \(G(x)\) towards \(F(x)\). Such continuation leads to a *Davidenko differential equation*, an ordinary differential equation (ODE) that is solved by standard numerical methods for ODEs. The state-of-the-art implementations are Bertini, HOM4PS, HomotopyContinuation.jl, NAG4M2, and PHCPack.

On the other hand, EAG counts the number of solutions to a system of polynomial equations. Although they might initially appear different, NAG and EAG are intimately related: NAG’s key benefit is that one can generate initial values *for all isolated solutions* of \(F(x)=0\) in \(\mathbb{C}^n\). For instance, if the degree of the \(i\)th polynomial is \(d_i\), then \(G(x)=(x_i^{d_i}-a_i)_{i=1}^n\)—where \(a_1,\ldots,a_n\in\mathbb C^*\)—can serve as a start system for the homotopy \((1-t)G(x)+tF(x), 0\leq t\leq 1\). \(G(x)=0\) has \(D=d_1\cdots d_n\) isolated solutions, and a theorem from algebraic geometry implies that \(F(x)=0\) has *at most* \(D\) isolated solutions. Continuing the solutions of \(G(x)=0\) towards \(F(x)=0\) produces all isolated solutions of \(F(x)=0\). In practice, however, \(F(x)=0\) has significantly fewer solutions than \(D\), and diverging solutions must be eliminated. EAG helps construct other start systems that are adapted to the structure of \(F(x)\) and improve the algorithm’s efficiency.

Returning to TDA, let us consider the situation in which \(M\subset \mathbb{R}^n\) is the zero set of \(s\) polynomials in \(n\) variables \(F(x)=(f_1(x),\ldots, f_s(x))\). In algebraic geometry, such an \(M\) is called a *real algebraic variety*. The conformation space of the cyclooctane molecule serves as an example. Cyclooctane is comprised of eight carbon atoms \(x_1,\ldots,x_8\in \mathbb{R}^3\) aligned in a ring, such that the distances between neighboring atoms all equal \(c>0\). The energy of configuration \((x_1,\ldots,x_8)\) is minimized when each angle between successive bonds amounts to \(\arccos(-\frac{1}{3}) \approx 109.5^\circ\). The polynomial equations in \(3\cdot 8 = 24\) variables are

\[\Vert x_1 - x_2\Vert^2 = \cdots = \Vert x_7 - x_8\Vert^2 = \: \Vert x_8 - x_1\Vert^2 = c^2 \]

and

\[ \Vert x_1 - x_3 \Vert^2 =\cdots = \:\Vert x_6 - x_8 \Vert^2 =\Vert x_7 - x_1 \Vert^2 =\:\Vert x_8 - x_2 \Vert^2 = \frac{8}{3}c^2. \]

The solution set of these equations—up to simultaneous translation and rotation—is homeomorphic to a union of the Klein bottle and a sphere, which intersect in two rings [6].

NAG can generate a sample of points from \(M\), which may then serve as input for PH; Figure 1 depicts a sample from the cyclooctane variety. One idea involves sampling linear spaces \(L\) of a dimension equal to the codimension of \(M\), and computing the points in the intersection of \(M\) and \(L\). In another approach, researchers sample points \(q\in\mathbb{R}^n\) in the ambient space and determine the point on \(M\) that minimizes the distance to \(q\). One can cast both computational problems as a system of polynomial equations and use NAG to solve them. Investigative directions in NAG include how to sample with respect to a probability distribution on \(M\) [1] and produce samples with the desired level of density in \(M\) [8].

**Figure 1.**A sample from the cyclooctane variety for \(c^2=2\), projected on a three-dimensional space. Due to translational and rotational invariance, \(x_1=(0,0,0)\) and \(x_2=(c,0,0)\) are fixed and the last entry of \(x_3\) is set equal to zero. Code and image courtesy of Paul Breiding and Sascha Timme.

Researchers also use NAG and EAG to study two important numbers for TDA: the homological feature size \(\textrm{hfs}(M)\) [5] and the reach \(\tau(M)\) of \(M\) [12]. Emil Horobet and Madeleine Weinstein showed that if \(M\) is an algebraic manifold (i.e., a real algebraic variety that is also a manifold) defined by polynomials over \(\mathbb{Q}\), then both \(\textrm{hfs}(M)\) and \(\tau(M)\) are algebraic over \(\mathbb{Q}\) [11]. Therefore, one can compute both by means of NAG.

\(M\)'s reach is the distance from \(M\) to its medial axis. An equivalent definition is \(\tau(M)=\min\{\tfrac{1}{\sigma(M)},\tfrac{1}{2}\rho(M)\}\), where \(\sigma(M)\) is the maximal curvature of a geodesic in \(M\) and \(\rho(M)\) is the width of \(M\)'s narrowest *bottleneck*. A bottleneck is a pair \((x,y)\in M^2\), such that \(x-y\) is perpendicular to both tangent spaces \(\mathrm{T}_x M\) and \(\mathrm{T}_y M\). One can formulate this as a system of polynomial equations (solved using NAG) and extract the real solutions from the complex ones. Recall that one computes the *isolated* solutions of a polynomial system in NAG; the trivial solutions for which \(x=y\) are not computed. Scientists have studied bottlenecks intensely, both from the perspective of NAG [9] and EAG in terms of *polar classes* of \(M\) [7].

Equations for \(\sigma(M)\) are less straightforward, but a direct formula for the curvature \(\gamma(x)\) at \(x\in M\) indicates that \(\sigma(M)=\min_{x\in M}\gamma(x)\) for planar curves. In this case, the first-order optimality conditions for \(\sigma(M)\)—that the gradient of \(\gamma(x)\) is perpendicular to the tangent space \(\mathrm{T}_x M\)—generate a system of polynomial equations. Solving this yields \(\sigma(M)\).

In summary, one can compute \(\rho(M)\) and \(\sigma(M)\) separately via NAG, and infer the reach \(\tau(M)\) from those numbers. Figure 2 illustrates an example of this computation for a planar curve.

**Figure 2.**The ingredients for computation of the reach of planar curve \(C=\{(x^3 - xy^2 + y + 1)^2(x^2 + y^2 - 1) + y^2 = 5\}\).

**2a.**All bottlenecks of \(C\). The narrowest bottleneck is red with a width of \(\approx 0.138\).

**2b.**All points of critical curvature of \(C\). The red point in the uppermost left is the point of maximal curvature and is \(\approx 2097.17\). Therefore, \(\tau(C) \approx \{\tfrac{1}{2097.17}, \tfrac{0.138}{2}\} = \tfrac{1}{2097.17}\). Code and image courtesy of Paul Breiding and Sascha Timme.

One can also replace the reach with a lower bound that involves the *real condition number* of a system of polynomials [3]. This lower bound holds for both real algebraic varieties and the more general class of *semialgebraic sets*. Researchers use the condition number to derive a complexity analysis of an algorithm for computing homology.

Finally, I would like to propose three possible future directions for the use of algebraic geometry in TDA. The first is an analysis of \(\sigma(M)\) in the context of NAG and EAG for bottlenecks [7, 9]. This is indispensable when computing the reach beyond planar curves. The second is PH using ellipsoids. Experiments show that this approach can greatly improve the output diagrams’ quality in PH [2], yet it lacks a theoretical explanation. The third direction involves sampling. The standard approach to sampling from nonlinear objects uses *Markov Chain Monte Carlo* methods. Combining this approach with NAG seems promising.

**Acknowledgments:** I would like to acknowledge funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 787840). I would also like to thank Oliver Gäfvert and Madeleine Weinstein for providing feedback on the article.

**References**

[1] Breiding, P., & Marigliano, O. (2019). Random points on an algebraic manifold. Preprint, *arXiv:1810.06271*.

[2] Breiding, P., Kalisnik, S., Sturmfels, B., & Weinstein, M. (2018). Learning algebraic varieties from samples. *Revis. Matemát. Complut., 31*, 545-593.

[3] Bürgisser, P., Cucker, F., & Lairez, P. (2019). Computing the homology of basic semialgebraic sets in weak exponential time. *J. ACM, 66*(1), 1-30.

[4] Carlsson, G., & Zomorodian, A. (2009). The theory of multidimensional persistence. *Discrete Comput. Geom., 42*, 71-93.

[5] Cohen-Steiner, D., Edelsbrunner, H., & Harer, J. (2007). Stability of Persistence Diagrams. *Discrete Comput. Geom., 37*, 103-120.

[6] Coutsias, E., Martin, S., Thompson, A., & Watson, J. (2010). Topology of cyclo-octane energy landscape. *J. Chem. Phys., 132*, 234115.

[7] Di Rocco, S., Eklund, D., & Weinstein, M. (2019). The bottleneck degree of algebraic varieties. Preprint, *arXiv:1904.04502*.

[8] Dufresne, E., Edwards, P., Harrington, H., & Hauenstein, J. (2018). Sampling real algebraic varieties for topological data analysis. Preprint, *arXiv:1802.07716*.

[9] Eklund, D. (2018). The numerical algebraic geometry of bottlenecks. Preprint, *arXiv:1804.01015*.

[10] Harrington, H., Otter, N., Schenck, S., & Tillmann, U. (2017). Stratifying multiparameter persistent homology. *SIAM J. Appl. Alg. Geom., 3*, 439-471.

[11] Horobet, E., & Weinstein, M. (2019). Offset hypersurfaces and persistent homology of algebraic varieties. *Comput. Aid. Geom. Des., 74*, 101767.

[12] Niyogi, P., Smale, S., & Weinberger, S. (2008). Finding the homology of submanifolds with high confidence from random samples. *Discrete Comput. Geom., 39*, 419-441.