| July 23, 2014

A Smale Challenge Problem Edges Toward Solution

In 1998, Stephen Smale compiled a list of 18 challenge problems for the 21st century. Two of them—the Riemann hypothesis, at #8, and a problem concerning the topology of algebraic curves and surfaces, at #16, were held over from Hilbert’s more famous list of 23 challenge problems for the 20th century. Others were unmistakably 20th century in origin, including #3, P = NP?; #18, concerning the limits of intelligence; and #17, which asks for an algorithm capable of computing approximately, in low-order polynomial average time, a single zero of a system of \(n\) polynomial equations in \(n\) complex unknowns. Michael Shub, whose 1967 UC Berkeley thesis on dynamical systems was directed by Smale, and who is currently an adjunct professor of mathematics at the CUNY Graduate School, surveyed progress toward a solution of Smale’s 17th problem in an invited talk at the 2013 SIAM Annual Meeting.¹

To speak of average computation time, one must specify a class of problems to be solved, an algorithm for solving them, and a measure of the size of any problem in the given class. Smale initiated the investigation of average computational times, setting out first to discover an efficient algorithm for finding the roots of a single complex polynomial. In later work [7], he sought to explain why the simplex algorithm works as well as it does in practice, despite being NP-complete in unfavorable cases. His work on the simplex algorithm was particularly fruitful, as it answered a question many were asking and spawned a flurry of follow-up activity.

The class of problems the simplex algorithm is designed to solve is just the class of linear programs \[\begin{equation}(P)~~~~~~~~~\textrm{minimize} \langle{c,x}\rangle\\\textrm{subject to}~Ax \leq b~\textrm{and}~x \geq 0,\end{equation}\] where \(A\) is an \(m \times n\) matrix of real numbers, and \(b\), \(c\), and \(x\) are conformable real vectors. There is a natural way to turn the set of all such problems into a manifold \(\mathbf{\cal{M}}\), and a natural measure of the “size” \(|P|\) of any problem \(P \in \mathbf{\cal{M}}\), namely \(|P| = min (m,n)\). If \(T(P;\alpha)\) denotes the time required for algorithm \(\alpha\) to solve the problem \(P \in \mathbf{\cal{M}}\), and if there is a natural probability measure on \(\mathbf{\cal{M}}\), it is then possible to compute the expectation \(E(T(P;\alpha))\). If \(\mathbf{\cal{M}}\) consists of all linear programs, if \(\alpha\) is a certain version of the simplex algorithm, and if \(T(P;\alpha)\) is proportional to the number of vertices visited by \(\alpha\), it is known [1] that \[\begin{equation}E(T(P;\alpha)) = O(|P|^{2}).\end{equation}\]

Systems of Equations

The situation with regard to the solution of systems of equations is less transparent. Relying on his experience in solving practical problems at IBM (where he was a member of the research staff from 1985 to 2004), Shub observed that methods which work for systems of polynomial equations tend to work nearly as well for equations of other types. Accordingly, it made sense to confine his attention to the following problem: \[\begin{equation}(Q)~~~~~~~~~\textrm{find}~\zeta = (\zeta_{1},\cdots, \zeta_{n})\\\textrm{such that} f (\zeta) = 0,\end{equation}\] where \(f = (f_{1},\cdots,f_{n})\) is a system of (real or complex) polynomials in \(z_{1},\cdots, z_{n}\).

Whereas it is easy enough to impose a manifold structure on the set of all such systems, and to define a probability measure on the resulting \(\bf\cal{M}\), it is not so clear what algorithm(s) \(\alpha\) to consider, or even how to define \(T(Q;\alpha)\) for a given \(Q \in \mathbf{\cal{M}}\) and \(\alpha\). Unlike the simplex algorithm, which terminates after a finite number of steps, standard algorithms for computing the zeros of even a single polynomial do not. Standard techniques, such as Newton’s method, typically produce an infinite sequence of approximations devised to converge to the desired \(\zeta\). Shub’s response was to consider an algorithm to have terminated on arrival in a neighborhood throughout which Newton’s method is or at least appears to be quadratically convergent. Such neighborhoods typically occupy but a small portion of the surrounding basins of attraction.

Figure 1. Basins of attraction for Newton's method applied to z⁴ - 1.

Basins of attraction are notoriously complex, as shown in Figure 1, which depicts the basins of Newtonian attraction of the distinct zeros of \(z^{4} – 1\). There may even be points of the complex plane that do not belong to any basin of attraction, as is the case for the quadratic polynomial \(z^{2} + 1\). Launching Newton’s method from any real starting point yields a sequence of real “approximations” incapable of approaching either imaginary zero.

In order to take advantage of Bezout’s theorem on the number of zeros of a system of complex homogeneous polynomials, we begin by letting \(d_{1},\cdots, d_{n}\) be the degrees of \(f_{1},\cdots, f_{n}\), and homogenizing each \(f_{i}\) by multiplying each constituent monomial by whatever power of an auxiliary variable \(z_{0}\) is needed to increase its degree to \(d_{i}\). The result will be a system of \(n\) homogeneous polynomials of degrees \(d = (d_{1},\cdots, d_{n})\) in the \(n + 1\) variables \(z_{0},\cdots, z_{n}\). A variant of Newton’s method, known as the projective Newton’s method, applies to such systems.

If \(\mathbb{H}_{(d)}\) is the vector space of all homogenous systems \(f\) of degree \(d\), then \(dim{\mathbb{H}}_{(d)}\) can be shown to be \(N=\sum_{i=1}^{n}C_{n}^{n+d_{i}}\), where \(C_{n}^{m}\) is a binomial coefficient, and the number of solutions \(\zeta = (\zeta_{0},\cdots , \zeta_{n})\) is known by Bezout’s (19th-century) theorem to be \(B = \prod_{i}d_{i}\). In particular, if each \(d_{i}\) is 2, then \(N \sim n^{3}/2\) while \(B = 2^{n}\). The number of solutions \(\zeta\) is thus exponential in \(n\), dashing any hope for a polynomial-time algorithm to produce all solutions.

With only a few equations of high degree, the problem can be reduced via exact symbolic techniques (Gröbner bases, resultants, and the like) to that of solving a univariate polynomial of degree \(B\) in polynomial time. But that fact alone does not lead to a solution of Smale’s 17th problem, even in combination with the homotopy methods employed by Smale in his early work on single polynomials, much of it [5,6] in collaboration with Shub.

Homotopy Methods

In his talk, Shub described recent progress toward a complete solution of Smale’s 17th problem via homotopy methods. Such methods operate by embedding a trivial system \(f_{0}\) and a target system \(f_{1}\) in a continuum of systems \(f_{t};0 \leq t \leq 1\) in the hope that, for some partition \(0 = t_{0}< t_{1} < \cdots < t_{m} = 1\) of the interval [0,1], it will prove possible to solve each intermediate system \(f_{t_i} = 0;i = 0,1, . . . , m\) in turn, with each intermediate approximation \(r_{t_i}\) serving as the starting point in a search for the next. Thus, Shub’s algorithms \(\alpha\) all proceed by discrete steps from an initial approximate solution \(r_{0}\) of \(f_{0} = 0\) to . . . to a solution \(r_{i}r_{t_i}\) of \(f_{t_i} = 0\) . . . to the desired approximate solution \(r_{1}\) of \(f_{1}\) = 0.

A homotopy \(\Gamma_{t} = (f_{t},\zeta_{t})\) is a curve in the space of all pairs \((f,\zeta)\), and an approximate homotopy is a surrounding tube so slender that Newton’s method converges quadratically to \(\zeta_{\tau}\) from any starting point \(r_{\tau}\) within the section \(t = \tau\) of that tube. Nothing prevents such tubes from merging and/or bifurcating as \(t\) increases from 0 to 1.

To analyze homotopy algorithms, Shub considered the “solution variety” V of all pairs \((f,\zeta)\) such that \(f(\zeta) = 0\). When appropriate norms are imposed on the (factor) spaces of systems \(f\) and arguments \(\zeta\), V becomes a smooth differentiable manifold \(\bf\cal{M}\) on which one can define—among other things—the length \(\Lambda\) of a homotopy \(\Gamma_{t} = (f_{t},\zeta_{t})\) to be \[\begin{equation}\Lambda(\Gamma_{t}) = \int_{0}^{1} \mu(f_{t},\zeta_{t}) ||(\dot{f}_{t},\dot{\zeta}_{t})|| dt,\end{equation}\] \(\mu(f_{t},\zeta_{t})\) being a generalization of the usual “condition number” for a system of linear equations, defined wherever \(\bigtriangledown f_{t}(\zeta_{t}) \neq 0\). The inclusion of \(\mu(f_{t},\zeta_{t})\) within the arc-length integral causes geodesics in \(\bf\cal{M}\) to deviate from the straight and narrow toward regions favorable to computation, much as light rays in outer space bend toward concentrations of mass.

Shub mentioned a proof by Carlos Beltrán and Luis Miguel Pardo [2,3] that the problem \(Q\) can be solved with high probability in polynomial time, leading some in the field to consider Smale’s problem #17 solved. After all, if such a method fails to produce a solution within a reasonable amount of time, another try can always be made from a different starting point, thereby squaring the (already small) failure probability.

Shub does not consider Smale’s problem #17 solved––he is certain that Smale had in mind an algorithm \(\alpha\) that would work for sure. After mentioning progress in this direction by Felipe Cucker and Peter Bürgisser [4], Shub ended his presentation by urging listeners to help “finish solving Smale’s 17th problem.”

¹Slides and audio for many sessions from AN13, including Shub’s AMS invited lecture, can be found here.

References
[1] I. Adler, N. Megiddo, and M.J. Todd, New results on the average behavior of simplex algorithms, Bull. Amer. Math. Soc., (N.S.), 11:2 (October 1984).
[2] C. Beltrán and L.M. Pardo, On Smale’s 17th problem: A probabilistic positive answer, Found. Comput. Math., 8:1 (2008), 1–43; doi:10.1007/s10208-005-0211-0.
[3] C. Beltrán and L.M. Pardo, Smale’s 17th problem: Average polynomial time to compute affine and projective solutions, J. Amer. Math. Soc., 22 (2009), 363–385.
[4] F. Cucker and P. Bürgisser, On a problem posed by Steve Smale, Ann. of Math., 174:3 (2011), 1785–1836; doi:10.4007/annals.2011.174.3.8.
[5] M. Shub and S. Smale, Computational complexity: On the geometry of polynomials and a theory of cost: Part I, Ann. Sci. Ec. Norm. Supér., 4:18 (1985), 107–142.
[6] M. Shub and S. Smale, On the geometry of polynomials and a theory of cost: Part II, SIAM J. Comput., 15 (1986), 145–161.
[7] S. Smale, On the average number of steps of the simplex method of linear programming, Math. Program., 27 (1983), 241–262.

James Case writes from Baltimore, Maryland.