By Thomas Wakefield
A quick online search for “data scientist” reveals a wealth of sites that convey the growth of this profession, rank it as one of the most desirable careers, and accentuate the field’s value. As data science continues to flourish, it is increasingly important that mathematicians understand both the field itself, and the way in which college and university faculty train and prepare students for entry.
The PIC Math Workshop on Data Analytics exposed attendees to the techniques and software used in data analytics problems (particularly classification problems), which strengthened their understanding of data analytics and machine learning. Participants also learned how to identify data analysis projects and mentor undergraduate students on such projects.
Randy Paffenroth of WPI conducted tutorials on the use of Python and sample code to implement various supervised classification algorithms, particularly k-nearest neighbors, decision trees, support vector machines, and linear discriminant analysis. He emphasized the trade-off between bias and variance; stressed the importance of cross-validation in machine learning; and overviewed techniques to prepare data for machine learning, including construction of a validation set, principal component analysis, and bootstrapping. Participants worked in groups to implement a classification algorithm on a data set from the University of California, Irvine Machine Learning Repository. They presented their results and discussed pitfalls and issues that arose in their implementation of the algorithms on the given data. “The main focus of the PIC Math Workshop was to give everyone the opportunity to get their hands dirty working with real data, no matter their background,” Paffenroth said. “I thought that the team exercises were the most important part of the workshop, and I hope all of the attendees enjoyed them!”
Jonathan Nolis, director and lead of Insights and Analytics at Lenati, offered valuable advice for faculty interested in preparing and engaging students pursuing data analytics. “It’s great that PIC Math exists and math professors are getting more involved in data science,” he said. “These professors are receiving valuable training in the sorts of jobs available in industry and how to prepare students for them. As someone who frequently hires for analytics jobs, I value students who understand the connection between mathematics and industry.”
The workshop was but one activity of the PIC Math grant, which increases students’ awareness and pursuit of career options outside academia. The grant exposes students to real problems from business, industry, and government, and provides faculty with the support necessary to offer students these opportunities. More information about PIC Math can be found on the website.
Acknowledgments: The NSF supports the PIC Math program with NSF grant DMS-1345499.
Thomas Wakefield is a professor of mathematics and statistics at Youngstown State University and a fellow of the Society of Actuaries. With support from the PIC Math program and industrial sponsors, Wakefield runs undergraduate mathematics research courses that allow students the opportunity to work on problems originating from industry.