SIAM News Blog

IMSI Long Program Explores Algebraic Statistics and Our Changing World

By Sam Hansen

As the name suggests, algebraic statistics is a cross-disciplinary area of mathematical science. Researchers in this space seek to answer statistical problems by using algebra-based results and methods, such as the treatment of polynomials from algebraic geometry, structures like rings and ideals from commutative algebra, and networks from combinatorics. Although the field of algebraic statistics is relatively new, it has already proven to be useful in applications like biology, network analysis, and algorithm design — and its reach and influence are continuing to grow.

To accelerate this steady growth, the Institute for Mathematical and Statistical Innovation (IMSI) recently hosted a long program on “Algebraic Statistics and Our Changing World: New Methods for New Challenges,” which took place at the University of Chicago from September to December 2023. IMSI—which is the newest of six math research institutes that are funded by the National Science Foundation—is characterized by its distinct focus on applications of math and statistics in the context of major societal challenges. Applied mathematicians and statisticians always play a prominent role in its scientific activities, which are cross-disciplinary in nature. For “Algebraic Statistics and Our Changing World,” these individuals convened with life scientists—phylogeneticists, plant pathologists, epidemiologists, and the like—as well as econometricians, economists, social scientists, and experts in a variety of other disciplines.

Over the course of three months, the program brought together more than 180 researchers from around the world for five workshops—including an apprenticeship week—and various seminars and reading groups. The organizers intentionally provided opportunities for early-career participants to network and connect with more seasoned attendees. The content focused on the application of algebraic statistics to multiple critical issues, such as the connection between patterns of genetic change and evolution and the relationship between economic networks and social inequities.

Throughout “Algebraic Statistics and Our Changing World,” five different working groups advanced research in their respective areas: (i) Causality, (ii) Machine Learning (ML) Thresholds of Colored Gaussian Graphical Models, (iii) Game Theory, (iv) Incomplete U-statistics for Phylogenetic Models, and (v) Neurovarieties of Rational Neural Networks. Participants have already deposited two dozen research preprints on arXiv that reflect their work during the long program. In addition, more than 100 talks—many of which are now publicly available as part of IMSI’s video collection—helped to expand participants’ knowledge of algebraic statistics throughout the course of the three-month event. Also worth noting is a weeklong workshop that spotlighted the emerging field of algebraic economics and convened experts in game theory, econometrics modeling, random networks, and causal analysis. The corresponding sessions and dialogues established algebraic game theory and causal inference as two pillars of this burgeoning area.

Four specific presentations particularly illustrated the diverse applications of algebraic statistics. During the opening workshop—an “Invitation to Algebraic Statistics and Applications”—Kathlén Kohn of the KTH Royal Institute of Technology gave a talk titled “What Will Happen in Neural Networks???” Kohn explored the potential use of ML for the solution of polynomial systems, which is a complicated topic. Although neural networks are powerful tools that can solve a multitude of problems with sufficient computational time and resources, the computer vision community has not seen a breakthrough in this area for at least 10 years — despite its employment of ML for relevant problems. Kohn’s titular question was so provocative that the Neurovarieties working group decided to focus on the related topic of neural net expressibility, which impacts the computational structures that underlie many ML applications. While the working group was not able to answer the original question, participants did identify a key part of the puzzle: We don’t know what type of distance needs to be minimized between a set of real-world data and the points that are modeled by a neural net.

Figure 1. Example of the “tree of blobs” simplification method for phylogenetic network inference. John Rhodes of the University of Alaska Fairbanks discussed this method during a talk about “Inferring the Tree-like Parts of a Species Network Under the Coalescent” during the Institute for Mathematical and Statistical Innovation’s long program on “Algebraic Statistics and Our Changing World,” which took place last year at the University of Chicago. Figure courtesy of John Rhodes.

Another highlight of the program involved the multispecies coalescent (MSC) model, which “provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow, and species delimitation” [1]. During the “Algebraic Statistics for Ecological and Biological Systems” workshop, John Rhodes of the University of Alaska Fairbanks spoke about “Inferring the Tree-like Parts of a Species Network Under the Coalescent.” Rhodes discussed recent work with Elizabeth Allman of the University of Alaska Fairbanks; Hector Baños of California State University, San Bernadino; and Jonathan Mitchell of the University of Tasmania to utilize the MSC model and identify genetic flow and evolutionary histories from genetic data (see Figure 1). In a similar vein, the working group on Incomplete U-statistics applied the network version of the MSC model to phylogenetic data and subsequently employed incomplete U-statistics. The continued development of this work could be very influential, as selecting appropriate statistical models and tests for phylogenetic data is historically difficult.

Finally, a major highlight of “Algebraic Statistics and Our Changing World” was the field’s connection to social science. For example, Weslynne Ashton of the Illinois Institute of Technology presented on “Justice, Equity and the Circular Economy” in the context of food distribution during the “Algebraic Economics” workshop. Ashton’s talk challenged participants to consider appropriate types of data, suitable models, and—most importantly—the integration of justice and equity into algebraic statistics applications. Eric Auerbach of Northwestern University later introduced a new yet related topic during the talk “Identifying Socially Disruptive Policies,” which took place at the same workshop. Specifically, Auerbach used network data to identify social disruptions and discussed cases wherein a network-based method identified policies for which the disruption effect was much larger than previously thought.

“Algebraic Statistics and Our Changing World: New Methods for New Challenges” is a great example of the type of scientific exploration that IMSI makes possible. Participants explored both the theoretical and practical aspects of recent developments in this rapidly growing field, forged new connections, and recast problems from across disciplines into algebraic statistics to accelerate forward growth. We are excited to see where both the attendees and the discipline will go next.

Acknowledgments: IMSI is supported by a grant from the Division of Mathematical Sciences at the U.S. National Science Foundation.

[1] Jiao, X., Flouri, T., & Yang, Z. (2021). Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl. Sci. Rev., 8(12), nwab127.

Sam Hansen is the Director of Communications and Engagement at the Institute for Mathematical and Statistical Innovation, which is hosted at the University of Chicago.

blog comments powered by Disqus