| March 01, 2016

Math within March Madness

Davidson College basketball team planning their game strategy. Photo credit: Tim Cowie(DavidsonPhotos.com).

Every year, a fever spreads through much of the United States. It peaks in mid-March and continues through early April. It’s called March Madness, the National Collegiate Athletic Association (NCAA)’s Division 1 basketball tournament. The event garners much attention before any time has ticked off a game clock, prior to any ball being shot or dribbled. Millions of people create brackets to predict the outcome of every game in the tournament and to compete against their friends and colleagues for pride, office pool winnings, and sometimes thousands of dollars.

The process of creating a bracket is limited to a few days since matchups are announced less than a week before teams play the first round of 32 games. A completed bracket contains your predicted winner for each of those 32 first-round games. Then you select winners for your 16 predicted second-round matchups. You continue this process until you reach your predicted teams vying for the national championship.

In 2014, Warren Buffett insured a billion dollar prize for anyone who could complete a perfect bracket for the tournament. Correctly predicting the outcome of 63 bracket matchups is not an easy task. In 2014, every one of the over 11 million brackets submitted to ESPN’s online tournament had missed a prediction after the first round.

For me, this is more of a time of March Mathness than Madness. In 2009, along with my collaborator Amy Langville of the College of Charleston and our student researchers, I applied our new ranking research to March Madness. Our research adapted the Colley and Massey methods, used to help rank college football teams and determine which teams play in New Year’s Day bowl games. Both methods form a matrix system from the results of a season.

When forming the linear system for the Massey method, let’s define the point differential for a single game between a winning team \(W\) and a losing team \(L\) as \(d_{WL} = points_W - points_L\). The Massey method derives its linear system from the assumption that \(r_W - r_L = d_{WL},\) where \(r_W\) and \(r_L\) are the Massey ratings for the winning and losing teams, respectively. Applying least-squares to this over-determined system results in the normal equations \(M\textbf{r = p}\), where \(\textbf {r}\) is the ratings vector. The Massey matrix can also be formed directly from game data. The diagonal of \(M\) results from placing a team’s total number of games, \(t_i\), on the diagonal \(m_{ii}=t_i.\) If \(g_{ii}\) represents the number of times teams \(i\) and \(j\) have played each other, the off-diagonal entries in the Massey matrix are \(m_{ii}=-g_{ii}.\)

For the Massey method, \(\textbf{p}\) contains information regarding a team’s point differential for each game: \(p_i = \Sigma_jd_{ij} - \Sigma_kd_{ki}.\) The sums over \(j\) and \(k\) represent the point differential from games that team \(i\) has won and lost, respectively. Since the sum over \(k\) is subtracted from the sum over \(j\), the sign of each entry in \(\textbf{p}\) gives an indication of a team’s scoring performance over the course of a season. Unfortunately, the linear system in \(M\textbf{r = p}\) has infinitely many solutions for any season. To reduce the number of possible solutions, we add the restriction that all of the ratings sum to zero by replacing the last row of \([M \enspace| \enspace \textbf{p}]\) with \([ 1 \enspace 1 ... 1 \enspace| \enspace 0]\). While infinitely many solutions are still possible, many systems will now have a unique solution.

Another option is to form the linear system for the Colley method, \(C\textbf{r=p}\), where \(C\) is the Colley matrix and \(\textbf{r}\) contains the ratings as a vector. First, \(C=M +2I\). The vector \(\textbf{b}\) contains information regarding each team’s number of wins and losses: \(b_i = 1 + \frac{w_i-l_i}{2}\). The Colley matrix \(C\) is strictly diagonally dominant, so the Colley method produces a unique rating vector \(r\).

Our math-based brackets solve these linear systems and assume a higher-ranked team wins any matchup. In 2010, I was teaching a portion of the methods to undergraduate students at Davidson College, one of whom created a bracket that beat over 99.9% of more than 5 million brackets on ESPN that year.

There are two important pieces to these methods. First, linear systems allow for interdependence of teams’ ratings. For example, if you lose to a weaker team, that game hurts you in the standings more than if you lose to a stronger team. This approach integrates strength of schedule into the rating method. Second, the new research allows predictive elements to be weighted. We weighted recency (among other factors), so teams that were playing well going into the tournament were rewarded. The linear systems themselves are created for approximately 350 teams in over 5,000 games, which enables us to find teams that might otherwise be overlooked.

The success of this work got the attention of the media. In 2014 alone, I spoke with The New York Times, USA Today, and CBS Evening News about this topic. There is a madness to March for me now, with much of the month being spent helping the public and my students create brackets.

Surprisingly, I rarely make a bracket myself due to lack of time. My first bracket was with my son. It occurred the year Warren Buffett insured the billion-dollar prize. After having seen and heard me talk about it on TV and radio so many times, my son convinced me to make a bracket with him. We didn’t win the billion! Then again, in the billion dollar online pool, out of over 8 million brackets submitted, none was perfect after the second round of games.

My work in March Madness launched my research into the field of sports analytics, a subfield of data analytics. Data analytics ideally affects decision-making by studying data, and sports analytics applies this to athletics. By translating scouting notes into matrices that can be analyzed and studied, one can better inform a team about whom to draft, sign, or acquire in a trade. An athlete’s performance is studied via hits, rebounds, points, times, or an aggregation of such statistics; often the goal is to find a trend that can increase the odds of winning.

The 2003 book Moneyball, by Michael Lewis, chronicled the Oakland A’s use of sabermetrics in Major League Baseball. This was the first use of analytics to make personnel decisions in professional sports. Today, an analytics department or an analytics expert is part of the staff of every major professional sports team. The growing use of analytics has also trickled down to fans, with interest increasing each year.

Using my research, I work with over a dozen students to help the men’s basketball team at Davidson College. Throughout the season, we sift through a wealth of statistics on basketball databases to scout our opponents. We also record game-to-game statistics not offered on the web for our analysis and study the effect of every lineup coaches use in a game.

My research group also aids professional sports organizations. We work with the NBA using SportVU data. To create these datasets, cameras located in the rafters of every NBA arena record \((x,y)\) coordinates of every player on the court and \((x,y,z)\) coordinates for the ball every 25th of a second throughout the game. While the specifics of our research falls under a nondisclosure agreement, I can share that our studies focus on officiating in one specific instance in a game. Our research supports the NBA league office as it studies the game to ensure officiating is consistent and fair. We have also worked with NASCAR teams, which is natural given that many teams are located near Davidson. Among our various projects, we created an algorithm that would detect loose wheels on a racecar. Our method read time-varying pressures recorded from instrumentation connected to the pneumatic torque gun (impact wrench), which is used during a pit stop to install and tighten nuts on all five bolts in less than 1.5 seconds.

If you’re interested in sports analytics, you’ll likely find the trajectory of my career compelling. When I entered the job market I had a great postdoc, had done exciting research at national labs, and had many pieces to my professional puzzle in place. Still, I likely would not have been someone’s first choice for sports analytics research. At that time, my work was in numerical partial differential equations. Do I model the NBA as a PDE? No. I use linear algebra, the mathematical tool for my work in PDEs. My shift into sports analytics came via ranking methods, and what followed was far from my original plan.

Nevertheless, the shift was very intentional. Upon receiving an Alfred P. Sloan research fellowship, I used the grant in part to move my work from PDEs to ranking. The shift took significant energy but fit my professional goals. For example, ranking, which merged into the larger discipline of sports analytics, was an area in which I could involve more students at earlier stages of their undergraduate studies; over 20 students have worked with me during the last year. We offer analytics to our college teams, professional sports organizations, and businesses with national markets.

I thoroughly enjoy my research field. Still, I can’t tell you the stats of any current baseball player. I forget the officials’ signals and only know someone fouled or scored. I’ll miss a game to play with my kids or walk with my wife, although she isn’t always willing to miss a game. What led me into this field is my mathematical ability and leveraging what I do well. I enjoy what I do because I get to see students become independent researchers, I like helping coaches and sports organizations gain insight from data, and I investigate research questions using mathematical techniques that deeply interest me.

I anticipate that I will work in sports analytics for a while, possibly a long while. Yet I don’t know for certain, and can enjoy that unknown. Life, in a way, is its own research question. Dive in. Study. Get confused. Discover and keep exploring. I think this allows for a career that isn’t a game in itself but can still be defined as winning, even with the inherent madness of life.

Sue Minkoff ([email protected]) of the University of Texas at Dallas is the editor of the Careers in Mathematical Sciences column.

Tim Chartier is an associate professor of mathematics and computer science at Davidson College. He received a national teaching award from the Mathematical Association of America and has worked with Google and Pixar on their educational initiatives.