| April 01, 2016

Pushing the Boundaries of Predictions with Data Analytics

In Recognition of Math Awareness Month on the “Future of Prediction”

From time immemorial, humans have been making predictions. We attempt to predict the weather, elections, disease transmission, market movements, and even the movement of heavenly bodies. A crucial aspect of many, if not all, such endeavors is the observation of the world around us. For example, Danish astronomer Tycho Brahe’s encyclopedic astronomical and planetary observations formed a cornerstone on which the progress of prediction of planetary motion was built.

Image designed by Sara Davidson, Graphic Designer, American Statistical Association.

We have come a long way since Brahe’s time. The amount of data that we gather and store every hour is growing exponentially. Accordingly, the term “big data” has infiltrated many conversations on prediction, and while the phrase is perhaps diluted by its very popularity, the amount of data to which we now have access is truly astounding. In 2010, Google CEO Eric Schmidt observed that we now generate as much data in two days as we did from the start of human history through 2003.¹

In many ways, the availability of data has outstripped our capacity to make effective predictions based on it. However, in the past decade, much progress in the field of data science has pushed the boundaries of what was once thought possible. For instance, as I am writing this text, a machine learning-based program called AlphaGo is competing with humans at the highest levels in the strategic board game Go, a feat long thought beyond the reach of computers.²

Rather than making predictions based on a particular set of rules, statistical and machine learning techniques allow desired predictions to arise as emergent phenomena in response to data provided to the algorithm [3]. In particular, the exact reasons for the algorithm’s given prediction may be unknown, and perhaps even unknowable, to the scientists that created it. AlphaGo is almost certainly a much better Go player than any of its creators, making moves that they themselves would not have anticipated. Similarly, many of the data-based predictions made by modern algorithms may be equally opaque to the scientists who designed them. In fact, one feature by which algorithms are judged is the interpretability of their decision-making process. It is interesting to consider whether we can or should trust predictions when the underlying reasons for those predictions are difficult, if not impossible, to understand.

Interior of a data center. The amount of data that we gather and store is growing exponentially.

Fortunately, ever since people have been making predictions from data, scientists have directed much effort towards understanding the trustworthiness of algorithmic predictions, even without fully comprehending the manner in which they are made. In particular, considerations of “overfitting” and “generalization” play key roles in modern algorithmic predictions [1]. Using such ideas, scientists strive to understand how and why algorithms trained on data we already possess can make credible predictions for observations of the world that haven’t yet been made.

One particularly exciting realm of modern data analysis is that of feature extraction. The key idea is that data in its raw form can be quite noisy. Perhaps even more importantly, the features that make for effective predictions are entangled with vast quantities of data that are uninformative or even counterproductive. Consider detecting faces in pictures. The pixel values represent the sum total of all of the information the picture has to offer, but the prediction of interest (is it a face?) would be greatly facilitated if high-level features (is there a nose?) were available. We can even think of such features in a hierarchical fashion, where low-level features, which are more easily inferred from the raw data, give rise to higher-level features. For example, two dark patches might be nostrils that make up a nose, and a nose can be combined with two ears and two eyes to represent a face. The idea of hierarchical feature extraction from raw data is one of the key elements of deep learning [4].

Deep learning is an exciting and active area of research that revolves around the extraction of high-level and, quite often, hierarchical features. “Feature engineering” by human experts has a long history in data analytics, but deep learning and other similar concepts have demonstrated considerable success in automating the unearthing of effective high-level features given vast quantities of raw data. Voice recognition in smartphones, image recognition in web searches, and self-driving cars are all examples of things that affect our daily lives now—or may in the near future—and are supported by advanced predictive techniques that leverage vast quantities of data.

Of course, many open questions regarding the future of prediction from big data remain. Even beyond a myriad of technical queries about the accuracy of predictions and computational efficiency, there are other important points to address [2]. Who owns the vast quantity of data produced every day? Do we really want predictions (accurate or not) of personal information? How will the ability to make predictions based upon new data sources and new algorithms affect society as a whole? These, and many other questions, must be addressed by scientists, policy makers, and the public as we enter a new era of prediction. It is an interesting time to be making predictions from data!

For more on the Future of Prediction, visit the Math Awareness Month website.

¹ http://techcrunch.com/2010/08/04/schmidt-data/

² http://www.nytimes.com/2016/03/10/world/asia/google-alphago-lee-se-dol.html?_r=0

References

[1] Abu-Mostafa, Y.S., Magdon-Ismail, M., & Lin, H. (2012). Learning from Data: A Short Course. Berlin, Germany: AMLBook.

[2] Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662-679.

[3] Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning (Vol. 1.) Springer Series in Statistics. Berlin, Germany: Springer Science+Business Media.

[4] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Randy Paffenroth is an associate professor of mathematical sciences and associate professor of computer science at Worcester Polytechnic Institute, where he researches large scale dimension reduction problems and compressed sensing. Paffenroth is core faculty in the WPI Data Science Program, and his main area of application is the detection of anomalies in computer networks.