In early June, the National Institutes of Health (NIH) Office of Science Policy released its new Strategic Plan for Data Science. To account for the rapidly increasing supply of data spread across a broad number of researchers in a variety of formats, the NIH seeks to mobilize advancements in storage, communication, and processing using tools—such as artificial intelligence, machine learning, and deep learning—that can revolutionize the way in which data is stored and maintained. Furthermore, the NIH recognizes the importance of developing robust information security approaches to preserve public trust and patient protection. This strategic plan offers the external community further insight into the organization’s future priorities and needs in data creation and maintenance.
Many members of the SIAM community responded to the NIH’s initial draft with feedback related to data management, analytics, tools, and workforce development. Thanks to SIAM involvement, the finalized plan now recognizes the importance of mathematics when advancing biomedical science and references the National Science Foundation’s (NSF) Division of Mathematical Sciences/National Institute of General Medical Sciences’ Mathematical Biology Program as a model for the promotion of research at the intersection of these two fields.
The Strategic Plan for Data Science was created in response to specific challenges identified by the NIH:
- The growing cost of data management could diminish the NIH’s ability to enable scientists to generate data for understanding biology and improving health.
- The current data-resource ecosystem tends to be “siloed,” and is not optimally integrated or interconnected.
- Important datasets exist in many different formats and are often not easily shareable, findable, or interoperable.
- The NIH has historically often supported data resources using funding approaches designed for research projects, which has resulted in a misalignment of objectives and review expectations.
- Funding for tool development and data resources has become entangled, making it difficult for one to independently assess the utility of each and optimize value and efficiency.
- No general system currently exists to transform innovative algorithms and tools created by academic scientists into enterprise-ready resources that meet industry standards of ease of use and efficiency of operation.
With the overarching principle that data should be Findable, Accessible, Interoperable, and Reusable (FAIR), the NIH has outlined five specific goals for its strategic plan, with objectives and a progress evaluation method under each goal:
1. Support a Highly Efficient and Effective Biomedical Research Data Infrastructure
1-1. Optimize Data Storage and Security
1-2. Connect NIH Data Systems
2. Promote Modernization of the Data-Resources Ecosystem
2-1. Modernize the Data Repository Ecosystem
2-2. Support the Storage and Sharing of Individual Datasets
2-3. Leverage Ongoing Initiatives to Better Integrate Clinical and Observational Data into Biomedical Data Science
3. Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools
3-1. Support Useful, Generalizable, and Accessible Tools and Workflows
3-2. Broaden Utility, Usability, and Accessibility of Specialized Tools
3-3. Improve Discovery and Cataloging Resources
4. Enhance Workforce Development for Biomedical Data Science
4-1. Enhance the NIH Data-Science Workforce
4-2. Expand the National Research Workforce
4-3. Engage a Broader Community
5. Enact Appropriate Policies to Promote Stewardship and Sustainability
5-1. Develop Policies for a FAIR Data Ecosystem
5-2. Enhance Stewardship
The NIH lists its implementation tactics under each objective in further detail. Several of the tactics under “Enhance Workforce Development for Biomedical Data Science” may be of interest to the research community. Relevant provisions include the following:
- The NIH states that the NSF is at the “forefront of supporting disciplines that contribute to data science,” and that it intends to work with the NSF on joint initiatives related to the training and education of researchers at different stages of their careers.
- To train its internal workforce, the NIH will recruit data scientists and others from industry and academia for one- to three-year sabbaticals for “NIH Data Fellows,” who will be embedded in a range of high-profile, transformative projects like the Cancer Moonshot, the All of Us Research Program, and the Brain Research through Advancing Innovative Neurotechnologies Initiative to provide expertise not internally available.
The Strategic Plan for Data Science is available on the NIH website.
— Lewis-Burke Associates LLC