By Min Ragan-Kelley, Carol Willing, and Jason Grout
Computation is increasingly becoming an integral part of science and education across disciplines. The life cycle of a computational idea typically involves interactive exploration and experimentation, as well as publication and communication of results. Reproducible computation demands open research tools, good software practices, and transparent documentation of research processes and results. Project Jupyter is an open community that builds open-source software tools and protocols for the life cycle of a computational idea. Two core pieces of the project are an open protocol for interactive computation and an open document format with which to record and share computational ideas. The Jupyter Notebook application builds on these to provide a powerful, interactive, computational environment.
What Is a Notebook? The Jupyter Notebook is a document composed of a sequence of code cells and markdown cells. A code cell contains a block of code (in any language, including but not limited to Julia, R, and Python) and the output from running the code. Output displayed below a code cell is a rich representation of results and can include text, images, and interactive visualizations. A markdown cell consists of prose text in the markdown format—a lightweight, shorthand syntax for HTML—with added support for LaTeX mathematics. This structure allows authors to interleave formatted narrative and mathematics with blocks of code and their rich outputs, rendering the notebook document a powerful tool for communicating insights and results.
The notebook document format is free, transparent, and understandable, in keeping with its aim to facilitate open and accessible science. It is stored as a single JSON-formatted text file, making it easy to manipulate and understand using standard programming tools, without the need for Jupyter software. The notebook file format is public, and Jupyter software is open-source under the BSD license.
Many authors communicate using Jupyter Notebook. GitHub hosts 1.4 million notebooks, and some people have written entire books as collections of notebooks, such as Jake Vanderplas’s Python Data Science Handbook. Because notebook documents preserve their content structure and metadata, they are easily convertible to other formats, including plain scripts in the document’s language of choice. This also makes them easy to integrate into publication pipelines via formats such as LaTeX, Markdown, and reStructuredText via Jupyter’s conversion tool, nbconvert.
Using Notebook Documents. The Jupyter Notebook server is a web-based application for interacting with notebook documents. The server renders a notebook document for the user and begins a persistent interactive “kernel” session in the desired language (such as Julia, R, Python, etc.) for executing user code. A user enters the code in a code cell and runs it (in the kernel session) before viewing the rich output results from the kernel, which may include text, images, and interactive controls. The user continues creating, editing, and executing or re-executing code cells in the same persistent kernel session, thus building and exploring a computational idea. The user can also create and interleave markdown cells to explain an idea using formatted text and mathematics (see Figure 1).
Users typically install the notebook application on their local computer, where it functions like a desktop application and works with local notebooks and data. Since it is a web application, the notebook server can also run on a remote computer and provide the notebook application via a browser, with no installation required on the user’s machine. Administrators of shared computational resources, instructors, or companies such as Microsoft can also host and manage these remote Jupyter Notebook servers. The JupyterHub project provides tools to host and administer remote servers for large numbers of users, without the need for software installation on every user machine. For example, over 1,000 students in the Data Science Education Program at the University of California, Berkeley successfully use Jupyter Notebooks in their classes through a university-wide JupyterHub deployment.
The Jupyter Notebook application interacts with a kernel session using the Jupyter kernel protocol — an open, documented message protocol for communication of the code to execute and the resulting rich output representations between front ends and language kernels. Many programming language communities provide interactive kernels that understand this message protocol and seamlessly work with applications like Jupyter Notebook.
Jupyter provides useful tools for communicating scientific results. At its base level of reproducibility, a notebook is a single, shareable document containing a prose explanation of an idea, code for its implementation, and the output and figures illustrating the results. To accurately reproduce another researcher’s findings, a reader needs access to a similar computational environment, including the applicable data and software libraries. Binder provides a service for sharing a reproducible computational environment. For example, the LIGO/Virgo collaboration (see Figure 3) uses Binder to enable interaction with its Nobel Prize-winning research, observing gravitational waves with a single click.
Due to its strength in exploratory research and scientific communication, Jupyter has found a welcome home in education. As computation becomes more prominent in ever-widening fields—including computational biology, digital humanities, and data literacy—almost every student will need to apply programming in some form. Several open-source tools for education have developed around the Jupyter Notebook.
RISE, a plugin for the Jupyter Notebook application, displays notebooks as presentations by stepping through interactive content cells. These presentations are live notebooks, so instructors can pause to answer questions and run code demonstrations.
The nbgrader project is another example of a tool that is useful to educators. It allows instructors to automatically distribute Jupyter Notebook assignments and students to submit completed notebooks. Since notebook cells sometimes have associated metadata, instructors can mark submissions efficiently by selecting certain notebook cells to be automatically graded and others to be graded by hand.
Project Jupyter provides open, documented protocol and notebook file format standards, in addition to a wide variety of interoperable open-source tools for the life cycle of a computational idea. These open standards and tools range from early exploratory interactive research to communication of insights and results, and enable a flourishing ecosystem to support collaboration and reproducibility in science and education.
Acknowledgments: The authors wish to recognize the following people for their contributions to this article: Fernando Perez, Paul Ivanov, Brian Granger, Jessica Forde, Matthias Bussonier, and Damian Avila, as well as the Project Jupyter team on whose combined work this report is based.
Min Ragan-Kelley has been working on interactive computing tools as part of the IPython and Jupyter teams since 2006. He now works primarily on JupyterHub as a postdoctoral researcher at Simula Research Laboratory in Oslo, Norway. Carol Willing is a Python Software Foundation Fellow and former director, a core developer for CPython and Project Jupyter, and a research software engineer at California Polytechnic State University. She is also geek-in-residence at Fab Lab San Diego, and co-organizes PyLadies San Diego and San Diego Python. Jason Grout is a Jupyter developer at Bloomberg. He received his Ph.D. in mathematics from Brigham Young University and has been helping develop open-source scientific software platforms, such as SageMath and Jupyter, since 2007.