By Catherine Micek
As a freshman in college, I decided that I wanted to be a math professor. I’d always valued learning and had been interested in teaching from a young age. My desire to teach mathematics was twofold. I enjoyed the challenge of communicating a dense subject by breaking down complex ideas and explaining them step by step, and I loved learning about the applications of mathematics across diverse fields like physics, computer science, and music. As a professor, I knew I could continue learning through research while sharing my love of math with students. With this goal in mind, I pursued a Ph.D. in applied mathematics.
Yet by the time I obtained my degree, I was no longer certain that I wanted to continue in academia. Furthermore, the then-recent 2008 financial crisis meant that the academic job market was poor. It was also very important for me to stay near my family in Minnesota, and academic career paths would have likely not allowed that. So I decided to accept a job as a visiting assistant professor at Augsburg College in Minneapolis, Minn., and start exploring industry career options. The buzz around data science began shortly thereafter.
Management magazines like the Harvard Business Review were touting data science as a vehicle that could enable companies to transform their operations, decision-making, and product development. Because the underlying business problems were complex, data scientists would need a strong quantitative and computational skill set to solve them. Although my Ph.D. was not in statistics or machine learning, I had developed extensive computational skills during my thesis work and was excited about learning new areas of mathematics. My transition to data science thus began early in my career with my first industry job as a predictive modeler at the Travelers Companies, Inc. in 2012.
The data scientist’s algorithm or analysis is often embedded in a data science product, which automates data processing through the algorithm via a software delivery mechanism. The level and technical sophistication of automation depends on the team’s collective skill set. For example, when my team included software engineers, the implementation was a custom-built software package fully automated from end to end. But with no developers to assist with automation, the data scientists had to devise simple, semi-automated implementations — such as an R script run on an ad-hoc basis to generate Excel workbook output.
An end user consumes the output of the data science product. In my experience, he/she is typically a business analyst; I have thus often had to translate mathematical output into actionable business information. For instance, when explaining a prediction interval for a sales forecast, my description would be something like “a prediction interval quantifies the uncertainty in our forecast by providing a best- and worst-case estimate of sales.” The goal is to present the analyst with a high-level description of the math that explains how one can use the information for decision-making. This explanation is especially important if the business analyst is not relying solely on a model for the forecast, but rather combining the model output with other pieces of business intelligence.
When I begin tackling a data science business problem, I find it helpful to break the problem into five primary components: the business question, data, algorithm or analysis, delivery mechanism, and communication of results. I think of each component as either technical or organizational. The data, algorithm and analysis, and delivery mechanism constitute the technical components, while the business question and communication of results make up the organizational elements. In this framework, it becomes clear that the required skill set for an effective data scientist is a blend of hard and soft skills: a range of technical abilities in mathematics, programming, and software engineering; an understanding of the business domain; and strong communication techniques.
My duties as a data scientist have differed from team to team, so data scientists should expect to accommodate the expertise of their business teams. On teams where I was the sole data science expert, I was simultaneously responsible for handling all of a problem’s technical components (including software engineering) and learning the domain well enough to effectively communicate with the domain experts. In contrast, as a data scientist on a team of technical experts (data scientists, back-end developers, front-end developers, etc.), I have focused more on a problem’s technical components and less on the organizational factors. I have learned a lot from each role, but the emphasis on what I was required to learn (e.g., domain expertise, software engineering, or mathematics) has differed widely across positions. I am currently a data scientist for the finance organization at 3M. My job is to develop prediction and classification algorithms for the finance department, assist in the operationalization of these algorithms, and educate finance colleagues about data science techniques.
In conclusion, I would advise aspiring data scientists to think about which components of a data science business problem interest them the most. Are you drawn solely to the mathematics of the algorithm, exclusively to the technical elements, or to a mix of technical and organizational factors? I personally enjoy employing a broad array of mathematical and computational skills to solve problems, identifying the use of mathematics in specific domain applications, and teaching mathematics as I communicate results. This comprehensive blend of hard and soft skills is a good fit for me. What is a good fit for you?
Catherine (Katy) Micek is a data scientist at 3M. She holds a Ph.D. in applied mathematics from the University of Minnesota.