A quick online search for “data scientist” reveals a wealth of sites that convey the growth of this profession, rank it as one of the most desirable careers, and accentuate the field’s value. As data science continues to flourish, it is increasingly important that mathematicians understand both the field itself, and the way in which college and university faculty train and prepare students for entry.
Participants at the PIC Math Workshop on Data Analytics, held this May at Brigham Young University, listen to Randy Paffenroth introduce the Python programming language and its machine learning packages. Photo courtesy of Mikayla Sweet of the Mathematical Association of America.
To that end, 70 mathematics and statistics faculty gathered at Brigham Young University (BYU) in late May for a four-day workshop that introduced data analytics, machine learning, statistics, and programming to faculty with little to no expertise in these areas. Michael Dorff of BYU and Suzanne Weekes of Worcester Polytechnic Institute (WPI) organized the workshop as part of Preparation for Industrial Careers in Mathematical Sciences (PIC Math). PIC Math is a program of SIAM and the Mathematical Association of America, with support provided by the National Science Foundation (NSF).
The PIC Math Workshop on Data Analytics exposed attendees to the techniques and software used in data analytics problems (particularly classification problems), which strengthened their understanding of data analytics and machine learning. Participants also learned how to identify data analysis projects and mentor undergraduate students on such projects.
Randy Paffenroth of WPI conducted tutorials on the use of Python and sample code to implement various supervised classification algorithms, particularly k-nearest neighbors, decision trees, support vector machines, and linear discriminant analysis. He emphasized the trade-off between bias and variance; stressed the importance of cross-validation in machine learning; and overviewed techniques to prepare data for machine learning, including construction of a validation set, principal component analysis, and bootstrapping. Participants worked in groups to implement a classification algorithm on a data set from the University of California, Irvine Machine Learning Repository. They presented their results and discussed pitfalls and issues that arose in their implementation of the algorithms on the given data. “The main focus of the PIC Math Workshop was to give everyone the opportunity to get their hands dirty working with real data, no matter their background,” Paffenroth said. “I thought that the team exercises were the most important part of the workshop, and I hope all of the attendees enjoyed them!”
Jonathan Nolis, director and lead of Insights and Analytics at Lenati, offered valuable advice for faculty interested in preparing and engaging students pursuing data analytics. “It’s great that PIC Math exists and math professors are getting more involved in data science,” he said. “These professors are receiving valuable training in the sorts of jobs available in industry and how to prepare students for them. As someone who frequently hires for analytics jobs, I value students who understand the connection between mathematics and industry.”
The workshop was but one activity of the PIC Math grant, which increases students’ awareness and pursuit of career options outside academia. The grant exposes students to real problems from business, industry, and government, and provides faculty with the support necessary to offer students these opportunities. More information about PIC Math can be found on the website.
Acknowledgments: The NSF supports the PIC Math program with NSF grant DMS-1345499.