SIAM News Blog
SIAM News
Print

Some Historical Notes on the Computation of the Singular Value Decomposition

By John C. Nash

In 2018, Jack Dongarra considered the evolution of methods for computing the singular value decomposition (SVD) from the perspective of extreme scale problems [3]. This article is motivated by the same problem under extreme limitation of computational resources, particularly memory for program and data. I include some personal recollections to provide context for the computational environments of the times.

Background

One can define the SVD of matrix \(A\) as the matrix product

\[A=USV'\]

where \(U\) and \(V\) are orthonormal matrices and \(S\) is essentially a diagonal matrix of nonnegative elements. One usually considers more restrictive versions of this equation in most practical problems. For practical computation, Gene Golub and William Kahan led the evolution of interest in SVDs in 1965 [5].

The One-sided Jacobi SVD

One can also find an SVD by implicitly applying the Jacobi eigenvalue calculation to the product \(A'A\). Magnus Hestenes initially suggested this approach in 1958 [6] and Bruce Chartres added further discussion in 1962 [2]. Over a decade later, I tried the approach for extremely small computers by today’s standards [8]. A conversation between Chartres and I suggested that he was unable to create a code that one could share with others. But the ability to easily share programs is an important part of computing’s evolution. From this perspective, my initial attempt [8] still failed to provide even a recipe — though its simplicity allowed a fairly straightforward implementation. However, Len Lefkovitch and I later described distributable programs in 1976 [7]. I then included step and description codes that were easily adaptable to many different computing environments in my 1979 book, Compact Numerical Methods for Computers [9].

The Computational Targets

Figure 1. John Nash used a similar teletype to access the Data General NOVA and ECLIPSE. Note the paper tape reader and punch. The noise was such that he requested (and received) ear protectors. This image is under a Creative Commons license.
The codes from my book were shoehorned into very tiny computers. Our initial target was a segment of a minicomputer that offered a BASIC interpreter using 32-bit floating-point numbers; this was initially binary but later became hexadecimal. In either case, the arithmetic and special functions were atrocious by modern standards.

Other possible platforms included programmable calculators like the HP9830A—which was similar to later personal computers—as well as early programmable calculators and “pocket” computers. An exercise in frustration is to attempt the key-by-key entry of any codes from Compact Numerical Methods for Computers into such a device.

From 1970 to-1990, the dialects of programming languages were a frightful obstacle to the easy sharing of code, except perhaps for Fortran card decks or tape images. While the 1975 SVD method was coded in more than one dialect of Fortran and BASIC, I believe that the step-and-description form in my book [9] was easier to port.

Consequences and Serendipity

The SVD topped my list of programs to implement. Following a B.Sc. in chemistry and a doctorate in mathematics, I began a postdoctoral position in Edmonton, Canada where I performed quantum mechanical calculations on polymers. Tom Kerr of Agriculture Canada happened to be a technical interviewer for the Canadian federal public service in late 1972. Because jobs were scarce, I put my name down for whatever interviews were available. Lucky for me, Kerr was trying to build an ambitious econometric and math programming model and was searching for a numerical analyst. Within three weeks, I received an offer to move to Ottawa.

The models that I was estimating nearly all had highly correlated variables, so the SVD—which was not well-known to statisticians at the time—was a valuable tool. Computations usually required budgeted service-bureau accounts, which used an onsite remote card-reader and printer. Without budget I could, however, utilize the Data General NOVA via a 10-character-per-second teletype terminal (see Figure 1) and sometimes obtain out-of-hours access to the HP 9830A (see Figure 2).

Several other coincidental meetings were important for the evolution of compact codes. I had maintained contact with a number of University of Oxford workers, including Brian Ford of the Numerical Algorithms Group (NAG). My wife, Mary, won Blackwell’s Nancy Stirling Lambert scholarship in librarianship and was preparing a thesis on micropublishing. When interviewing Neville Goodman of Adam Hilger, she mentioned my work on computing; this resulted in a publishing contract. Agriculture Canada also generously allowed me two short leaves to work with NAG on “minicomputer” software in 1975 and 1976. NAG never did produce a minicomputer library, but Compact Numerical Methods for Computers [9] resulted from the collaboration.

Size Matters

Figure 2. This is essentially the machine that John Nash utilized at Agriculture Canada. Note the cassette drive, which he used to save programs. This image is under a Creative Commons license and a Gnu Free Documentation license.
Storage was a great concern in 1975 and drove many coding decisions. The machines on which the BASIC versions were run had between 3,000 and 8,000 bytes of storage for both program and data. The one-sided Jacobi transforms the original working matrix \(A\) to a product \(BV\), where \(B\)—the working matrix—is ultimately \(US\) of the SVD. To order the columns of \(B\), one could apply plane rotations with “large” rotation angles or swap the relevant columns of \(B\) and \(V\); i.e., use an implicit rotation angle of \(\pi/2\). Either approach sorts the column magnitudes and thus the singular values, thus saving the program code for sorting the singular values and vectors. Such code would be considered trivial nowadays. But at the time, it comprised a significant portion of the total code length. The memory really was small!

In 1960, George Forsythe and Peter Henrici had demonstrated convergence for Jacobi methods in the presence of a restriction on the size of the rotation angles [4]. There was concern that my use of larger angles might fail to converge. However, Hoy Booker showed this concern to be unfounded for computations in finite arithmetic in his 1985 doctoral thesis.

Further Developments

In another direction, Lefkovitch and I [7] combined the Givens’ decomposition

\[A=QR,\]

where \(Q\) is orthonormal and \(R\) is upper-triangular, with a row-wise SVD of \(R\) by the one-sided Jacobi. This let us handle \(m>>n\) linear least squares problems on a small computer, because bringing in the rows of data (including the dependent variable in linear models) one at a time is a considerable memory savings. In 1982, Tony F. Chan presented a similar approach using Householder transformations and the Golub-Kahan-Reinsch SVD [1]. The triangularization idea has since been called “Chan’s method.” Internet preprints did not exist back then, and Chan submitted his work around the time when my and Lefkovitch’s 1976 paper appeared.

In addition to the 1979 step-and-description codes, a Fortran 77 collection was informally distributed as NASHLIB. After the second edition of Compact Numerical Methods for Computers was published in 1990 with Turbo Pascal codes, the codes became available online. The Free Pascal compiler can run these with minor adjustments, thanks to the work of Florian Klämpfl, Michael van Canneyt, and others. Peter Olsen recently asked if he could port the Turbo Pascal codes to Python; this effort is growing to include re-worked Fortran, BASIC, and other versions on Github.

In the 1980s, Seymour Shlien and I considered a partial SVD for the purposes of image compression or analysis [10]. The principal adjustments to the one-sided Jacobi—given in step-and-description form—were the following:

  • Allowing computations of the rotation for larger angles rather than simply swapping columns
  • Always applying the rotation in such cases
  • Reducing the estimated rank when the right-hand column is very small.

For matrices that have less than full rank (as is common in data fitting or image compression), the rank reduction leads to a large performance gain. This is because a cycle over a matrix of rank \(r\) involves \(r*(r-1)/2\) rotations, each of which requires three inner products of length \(m\) plus two applications of the plane rotation — one to a pair of length \(m\) columns and the other to length \(n\). The trade-off is that the resulting SVD will yield a reconstruction that matches the original matrix less perfectly, though it will be more than adequate for most approximations. Moreover, the \(V\) may not be fully orthogonal. This may or may not be relevant to the problem at hand, therefore underlining one of the many twists related to SVD calculations.

More Recently

The motivating document [3] provides a recent review of a number of different approaches that emphasizes performance issues for large-scale computation of the SVD. Given that nearly a half century has elapsed since I first coded the compact one-sided SVD, researchers have made many attempts at improvement — some of which have been important. However, I believe that the most notable developments concern the standardization of computational infrastructure rather than improvements in the SVD codes themselves. Though a very minor contributor, I am among 30 names on the 1985 version of the IEEE 754 floating-point standard. After appearing in 1979, the Basic Linear Algebra Subroutines (BLAS) and Linear Algebra Package (LAPACK) amplified such standardization. When looking back at my own efforts, I estimate that the greater proportion of my work was to overcome the vagaries of arithmetic and programming languages. That has, thankfully, become a much less tiresome aspect of computing. As Field Van Zee and his collaborators described in a recent SIAM News article about BLAS-like Library Instantiation Software [11], the effort continues.


References
[1] Chan, T.F. (1982). An improved algorithm for computing the singular value decomposition. ACM Trans. Math. Softw., 8(1), 72-83.
[2] Chartres, B.A. (1962). Adaptation of the Jacobi Method for a computer with magnetic-tape backing store. Comput. J., 5(1), 51-60.
[3] Dongarra, J., Gates, M., Haidar, A., Kurzak, J., Luszczek, P., Tomov, S., & Yamazaki, I. (2018). The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale. SIAM Rev., 60(4), 808-865.
[4] Forsythe, G.E., & Henrici, P. (1960). The cyclic Jacobi method for computing the principal values of a complex matrix. Trans. Am. Math. Soc., 94(1), 1-23. 
[5] Golub, G.H., & Kahan, W. (1965). Calculating the singular values and pseudo-inverse of a matrix. J. SIAM: Series B, Numer. Analy., 2(2), 205-224.
[6] Hestenes, M.R. (1958). Inversion of matrices by biorthogonalization and related results. J. SIAM, 6(1), 51-90.
[7] Lefkovitch, L.P., & Nash, J.C. (1976). Principal components and regression by singular value decomposition on a small computer. Appl. Stat., 25(3), 210-16.
[8] Nash, J.C. (1975). A one-sided transformation method for the singular decomposition and algebraic eigenproblem. Comput. J., 18(1), 74-76.
[9] Nash, J.C. (1979). Compact numerical methods for computers: Linear algebra and function minimisation. Bristol, England: Adam Hilger.
[10] Nash, J.C., & Shlien, S. (1987). Simple algorithms for the partial singular value decomposition. Comput. J., 30(3), 268-275.
[11] Van Zee, F. van de Geijn, R., Myers, M., Parikh, D., & Matthews, D. (2021, April 1). BLIS: BLAS and so much more. SIAM News, 54(3), p. 8.

John C. Nash studied at the University of Calgary and the University of Oxford. He ran a statistical analysis unit for Agriculture Canada until 1980 and is now a retired professor in the Telfer School of Management at the University of Ottawa. Nash’s software is most frequently used under the umbrella of the R-project. His technical books cover topics like computation, statistics, risk management, and forecasting, while his general novels and short works address the social history of the last century. Several of his novels mention computation and are freely available on obooko.com
blog comments powered by Disqus