A Timely Focus on Data Science

Figure 1. A sampling of Venn diagrams that researchers have proposed to define data science. Image courtesy of Google Images.

It is generally agreed that data science involves mathematics, statistics, computer science, data, and applications. Attempts to characterize it have led to an assortment of Venn diagrams with variously defined sets (see Figure 1). Regardless of data science’s meaning, the SIAM community—with expertise in all the component areas—is uniquely positioned to contribute to it.

It is therefore both timely and appropriate that in 2018, SIAM introduced the SIAM Journal on Mathematics of Data Science (with editor-in-chief Tammy Kolda of Sandia National Laboratories), launched its Data Science book series (with editor-in-chief Ilse Ipsen of North Carolina State University), and is planning a data science conference for 2020. Moreover, data science boasts its own research area page on the new SIAM website.

The most hyped aspect of data science is undoubtedly machine learning, especially deep learning. Michael Elad examined the role of deep learning in imaging science in the May 2017 issue of SIAM News [3]. His analysis inspired discussion. He noted that “In most cases, deep learning-based solutions lack mathematical elegance and offer very little interpretability of the found solution or understanding of the underlying phenomena.” Peter Warden, data science book author and member of Google’s TensorFlow team, has written that “Most of machine learning is the software equivalent of banging on the side of the TV set until it works, so don’t be discouraged if you have trouble seeing an underlying theory behind all your tweaking!” Advances in the theoretical underpinning of machine learning and deep learning are clearly needed. The SIAM community can contribute novel understanding and breakthroughs; in fact, it is already doing so [2]. The new journal, book series, and upcoming conference provide the perfect platforms to report such contributions.

Cartoon created by mathematician John de Pillis.

The SIAM community is well versed in the aspect of data science that concerns the accuracy of results. Mathematicians are accustomed to handling uncertainties introduced by data and rounding errors. Here, the trend towards processors that support low-precision arithmetic is impinging on data science. The NVIDIA V100 graphics processing unit (GPU) supports half-precision floating-point arithmetic and—using its tensor cores—can execute it at a rate of up to 112 teraflops, compared with seven teraflops for double precision. It is therefore tempting to run data-intensive computations in half precision. But a half-precision number has only the equivalent of around four-decimal-digit precision; will the computed results have any correct digits?

Researchers have reported success with low precision in machine learning. For example, at the 2018 SIAM Conference on Parallel Processing for Scientific Computing, held this March in Tokyo, Takuya Akiba (Preferred Networks, Inc., Tokyo) explained how he and his colleagues trained ResNet-50 on ImageNet for 90-epochs (a standard benchmark) on 1024 NVIDIA P100 GPUs in 15 minutes, thus halving the previous record time. Among the key ideas was the selective use of half precision [1].

Of course, data science presents challenges not only in research but also in teaching — another area that benefits from SIAM community engagement. At the 2018 SIAM Conference on Applied Mathematics Education, held in Portland, Ore., this July, Gil Strang (Massachusetts Institute of Technology) organized a minisymposium on “Deep Learning and Deep Teaching.” In his talk, Gil presented some of the ideas contained in his forthcoming book on the subject [4] and posed the tantalizing question of whether deep learning can learn calculus; much discussion ensued.

Gil Strang speaks about deep learning during the 2018 SIAM Conference on Applied Mathematics Education, held in Portland, Ore., this July.

SIAM is already a great source of expertise and information on many aspects of data science. I look forward to our community playing a growing role in the area and attracting new members with an interest in the subject.

References
[1] Akiba, T., Suzuki. S., & Fukuda, K. (2017). Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes. Preprint, arXiv:1711.04325.
[2] Bottou, L., Curtis, F.E., & Nocedal, J. (2018). Optimization Methods for Large-Scale Machine Learning. SIAM Rev., 60, 223-311.
[3] Elad, M. (2017, May). Deep, Deep Trouble: Deep Learning’s Impact on Image Processing, Mathematics, and Humanity. SIAM News, 50(4), p. 12.
[4] Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley, MA: Wellesley-Cambridge Press. In press.

Nicholas Higham is Royal Society Research Professor and Richardson Professor of Applied Mathematics at the University of Manchester. He is the current president of SIAM.