| July 01, 2019

Technical and Organizational Challenges for Data Scientists

As a freshman in college, I decided that I wanted to be a math professor. I’d always valued learning and had been interested in teaching from a young age. My desire to teach mathematics was twofold. I enjoyed the challenge of communicating a dense subject by breaking down complex ideas and explaining them step by step, and I loved learning about the applications of mathematics across diverse fields like physics, computer science, and music. As a professor, I knew I could continue learning through research while sharing my love of math with students. With this goal in mind, I pursued a Ph.D. in applied mathematics.

Yet by the time I obtained my degree, I was no longer certain that I wanted to continue in academia. Furthermore, the then-recent 2008 financial crisis meant that the academic job market was poor. It was also very important for me to stay near my family in Minnesota, and academic career paths would have likely not allowed that. So I decided to accept a job as a visiting assistant professor at Augsburg College in Minneapolis, Minn., and start exploring industry career options. The buzz around data science began shortly thereafter.

Management magazines like the Harvard Business Review were touting data science as a vehicle that could enable companies to transform their operations, decision-making, and product development. Because the underlying business problems were complex, data scientists would need a strong quantitative and computational skill set to solve them. Although my Ph.D. was not in statistics or machine learning, I had developed extensive computational skills during my thesis work and was excited about learning new areas of mathematics. My transition to data science thus began early in my career with my first industry job as a predictive modeler at the Travelers Companies, Inc. in 2012.

Catherine Micek (third from left) and her team, K-Means Business, hard at work during 3M’s Data Intelligence Global Hackathon, which took place in April 2019. The team placed third. Image courtesy of Ghulam Jafferi.

Since then, my data science positions have encompassed a variety of technical roles—such as data scientist, software developer, and predictive modeler—across diverse industries including insurance, energy, and finance. My job responsibilities have included the technical work of solving data science problems and automating the solutions, and the organizational work of educating and training my business colleagues about the field itself. Each data science job that I have held falls within the broader field of decision sciences. To me, the role of data scientist involves using sophisticated mathematical and computational techniques to develop algorithms or analyses that extract meaningful information from data. As a data scientist, I have built statistical models to price insurance policies, created custom machine learning algorithms to detect anomalies on the electrical grid, and developed forecasting models to predict sales.

The data scientist’s algorithm or analysis is often embedded in a data science product, which automates data processing through the algorithm via a software delivery mechanism. The level and technical sophistication of automation depends on the team’s collective skill set. For example, when my team included software engineers, the implementation was a custom-built software package fully automated from end to end. But with no developers to assist with automation, the data scientists had to devise simple, semi-automated implementations — such as an R script run on an ad-hoc basis to generate Excel workbook output.

An end user consumes the output of the data science product. In my experience, he/she is typically a business analyst; I have thus often had to translate mathematical output into actionable business information. For instance, when explaining a prediction interval for a sales forecast, my description would be something like “a prediction interval quantifies the uncertainty in our forecast by providing a best- and worst-case estimate of sales.” The goal is to present the analyst with a high-level description of the math that explains how one can use the information for decision-making. This explanation is especially important if the business analyst is not relying solely on a model for the forecast, but rather combining the model output with other pieces of business intelligence.

When I begin tackling a data science business problem, I find it helpful to break the problem into five primary components: the business question, data, algorithm or analysis, delivery mechanism, and communication of results. I think of each component as either technical or organizational. The data, algorithm and analysis, and delivery mechanism constitute the technical components, while the business question and communication of results make up the organizational elements. In this framework, it becomes clear that the required skill set for an effective data scientist is a blend of hard and soft skills: a range of technical abilities in mathematics, programming, and software engineering; an understanding of the business domain; and strong communication techniques.

Catherine Micek (third from left) poses with her hackathon team, K-Means Business, during the final presentations of 3M’s Data Intelligence Global Hackathon, which took place in April 2019. The team won third prize. Image courtesy of Mani Upadhyaya.

The primary point of debate that I have observed seems to be not if a blend of hard and soft skills is required for data scientists, but what skills should comprise that blend. I have heard domain experts argue that understanding the domain is more important than the sophistication of the associated data science tools. Conversely, I have witnessed technical experts argue that technical expertise is more important because the domain is learnable. Since it is unrealistic for data scientists to be experts in all five areas, business teams typically rank the importance of these skills when hiring.

My duties as a data scientist have differed from team to team, so data scientists should expect to accommodate the expertise of their business teams. On teams where I was the sole data science expert, I was simultaneously responsible for handling all of a problem’s technical components (including software engineering) and learning the domain well enough to effectively communicate with the domain experts. In contrast, as a data scientist on a team of technical experts (data scientists, back-end developers, front-end developers, etc.), I have focused more on a problem’s technical components and less on the organizational factors. I have learned a lot from each role, but the emphasis on what I was required to learn (e.g., domain expertise, software engineering, or mathematics) has differed widely across positions. I am currently a data scientist for the finance organization at 3M. My job is to develop prediction and classification algorithms for the finance department, assist in the operationalization of these algorithms, and educate finance colleagues about data science techniques.

In conclusion, I would advise aspiring data scientists to think about which components of a data science business problem interest them the most. Are you drawn solely to the mathematics of the algorithm, exclusively to the technical elements, or to a mix of technical and organizational factors? I personally enjoy employing a broad array of mathematical and computational skills to solve problems, identifying the use of mathematics in specific domain applications, and teaching mathematics as I communicate results. This comprehensive blend of hard and soft skills is a good fit for me. What is a good fit for you?

The author shared her experience during a minisymposium at the 2019 SIAM Conference on Computational Science and Engineering, which took place earlier this year in Spokane, Wash. Listen to a firsthand account here!

Catherine (Katy) Micek is a data scientist at 3M. She holds a Ph.D. in applied mathematics from the University of Minnesota.