# Topology and Big Data at the Philadelphia Science Festival

By Lina Sorg

Much of society’s continued and future progression depends on data and our ability to utilize it. Self-driving vehicles will necessitate accurate data for road maps and the positional mapping of obstacles. The price of Amazon products (delivered by flying drones) will rely on the data behind supply chain management. 3D-printers and other types of additive manufacturing will require large quantities of data to function efficiently. Even the nanoscale assembly of metamaterials demands a great deal of data.

The future of big data was one of eight topics at the interactive presentation,“2066, A Science Odyssey – What Will Our World Look Like?”, part of the annual Philadelphia Science Festival, a nine-day community celebration featuring lectures, hands-on events, and other exhibits throughout the city. Additional subjects at the session, which took place at Philadelphia’s Franklin Institute on April 26, included sustainability, dark matter, machine learning, and the future of medicine.

Robert Ghrist, mathematician and professor at the University of Pennsylvania (UPenn), spoke about the role of topology in managing the massive amounts of data necessary to sustain society’s rapid development. Moore’s law indicates that the speed of computers has been growing at a fairly fixed exponential rate over the last several decades. Yet given the immense size of many networks, even computers that are 100 times faster than present machines will be unable to sustain this growth. “We need more than just fast computers,” Ghrist said. “We need new stuff, new ideas. A lot of these are going to come from mathematics, but not the mathematics that you’ve necessarily learned in school, not even necessarily the mathematics that we currently know about, but novel mathematics that needs to be invented in order to solve fresh challenges.”

Ghrist is particularly interested in the use of new concepts in engineering systems, and acknowledges that budding mathematical notions may at first seem dauntingly abstract. Topology, which he feels is currently the least used but most useful branch of mathematics, will play a major role in developing the ideas necessary to sustain large data networks.

Put simply, topology studies the qualitative features of abstract spaces, the kind of spaces generated from data. Ghrist illustrated the conceptual nature of topological data analysis by describing a collection (or cloud) of data points, which often display specific topological and geometric structures. Knowing the rough distance of these data points from the original data allows one to look at the collection’s aggregate structure and examine the features that develop in the so-called neighborhoods around the initial points. In colloquial terms, these features are called holes. Homology, which quantifies the qualitative features of a data set, lets mathematicians investigate the location of these holes inside the data. “That’s what leads us to some of the new ideas in mathematics that are just starting to have an impact on this field,” Ghrist said. “And 50 years from now, they may have quite a bit more impact.”

Ghrist talked about many different topological features that appear in data. Zero-dimensional homology, or H0, determines the number of connected components, a feature of topological data. This helps identify the process by which data points cluster into groups, similarly to how machine-learning algorithms identify patterns. Moving up a dimension to a data collection with many links results in H1, which calculates the “loops” (or cycles) present in the data, another topological feature. And H2 determines the number of “voids,” yet another topological trait. “We can keep going, from one-dimensional homology to two-dimensional homology that measures the types of holes that you’d see, say, in the middle of a basketball,” Ghrist said. “And, remarkably it does not stop there. It keeps going and going, beyond the third dimension and the fourth dimension and even higher.”

For instance, some of Ghrist’s current work at UPenn involves an examination of neural systems. In this case, the dimension is closely related to the number of involved neurons, which is clearly quite high. Thus, studying the topology of high-dimensional data allows one to see objects and features that are not otherwise easily visible.

Ghrist ended his discussion with the hope that many advanced mathematical techniques will eventually become more generic and make their way into school curricula. “What we are going to see is a period of very, very rapid evolution,” Ghrist said. “Not just in terms of the applications that we’re able to do and the types of data sets we look at, but in the very tools being used to understand this.”

Lina Sorg is the associate editor of SIAM News. |