In the pre-Google era, tools for searching the web, such as those provided by AltaVista and Yahoo, placed great emphasis on content. This played into the hands of unscrupulous individuals and companies, who could insert spurious but popular text into their web pages in order to attract hits. Google addressed this issue by focusing on topology, using a clever graph theory algorithm called PageRank that sees only the directed hyperlinks. In this way, Google works behind the scenes to quantify the relative importance of every web page. When we search for “\ttSIAM Review instructions for authors” Google can perform a real-time content-based match and combine the result with that global importance ranking in order to return relevant and high-quality suggestions.
The underlying graph theory and linear algebra for ranking pages was published openly (Page, Brin, Motwani, and Winograd, 1999). Although much of what is now under Google’s hood is shrouded in mystery, it is widely accepted that PageRank lies at the heart of the company’s success.
Last year, Dan Russell, Google’s Über Tech Lead for Search Quality and User Happiness, gave the Annual Science Faculty Lecture at the University of Strathclyde. He put forward the argument that literacy, in the sense of “knowing stuff,” is now less important than search skill, in the sense of “knowing how to find out stuff.” And to find out stuff we rely on algorithms such as PageRank that filter and rank information. These principles apply in many data-rich fields, which leads us to the theme of the current Survey and Review article. “PageRank Beyond the Web,” by David F. Gleich, surveys the development and use of PageRank technology in applications other than web search. After framing the mathematics and discussing convergence issues, the author shows how the algorithm has been applied in a vast range of disciplines, including bioinformatics, bibliometrics, business, chemistry, neuroscience, social media, and sports. We see that the original PageRank philosophy has been put to creative use in many instances where pairwise “interaction” can be defined. Perhaps less creative is the ***Rank naming convention (see the start of Section 4), and here, as an early offender, I must take my share of the blame. This accessible and informative article shows how Google’s billion dollar math idea has been re-interpreted, exploited, and extended, covering concepts in graph theory, network science, matrix computation, dynamical systems, stochastic processes, matrix functions, and regularization. It will therefore be of interest to those whose teaching or research overlap with any of these topics.
Read the paper! (Requires subscription or SIAM membership)
PageRank Beyond the Web
SIAM Review, 57(3), 321-363.
||Des Higham is a professor in the department of mathematics and statistics at the University of Strathclyde, Glasgow, and serves as section editor for the Survey & Review section of SIAM Review.