SIAM News Blog

The CoCalc Computing Environment

By Hal Snyder

CoCalc is a cloud-based service that makes a large collection of state-of-the-art open-source mathematical software available to any user with a web browser and an internet connection. Its intent is to make the working scientist’s time as productive and fulfilling as possible. The CoCalc platform supports dozens of programming languages—such as Python, Sage, R, Julia, C, C++, Haskell, Scala, and Fortran—and thousands of libraries and packages, including statistical and machine learning software for data science, computer algebra systems for symbolic mathematics, and scientific packages for physical sciences and bioinformatics.

Users can achieve mathematical computation without the need to install and maintain operating systems, compilers and interpreters, libraries, and packages. There are often ways to install a package that is not already available within a user’s project, and requests to install packages globally are typically processed in less than a day.

Work that extends across several programming languages and toolsets can be performed within a consistent setting and with minimal friction.

Built for Mathematics

William Stein, the lead developer of SageMath, introduced CoCalc as “SageMathCloud” in April 2013 as a hosted platform for the SageMath software system. Since its earliest versions, CoCalc has included support for computer algebra and numerical computation systems, including Sage, R language, SymPy, NumPy, SciPy, GNU Octave, PARI/GP, GAP, Singular, and Maxima.

Figure 1. Sage worksheet with side chat showing LaTeX.

The three most common file formats—Jupyter notebooks (.ipynb), Sage worksheets (.sagews), and Markdown files (.md)—all support LaTeX (see Figure 1). A full-featured text editor for LaTeX documents and a customizable build system make it possible to create complicated LaTeX documents using several LaTeX engines—pdflatex, latexmk, and xelatex—with most packages preinstalled. One could achieve dynamic typeset content with SageTeX for Sage and Rnw/knitr (see Figure 2) for R.

CoCalc chat supports Markdown with embedded LaTeX, both in standalone “chat room” files and side chat panes viewable in terminal sessions, notebooks, and files open for editing.

Open Source

Most of the CoCalc code base, including over 240,000 lines of user interface and server-side code, is open source, and users can very easily run it on their own computers with Docker. The CoCalc website uses Kubernetes, which relies on some code that is not open source, to provide a high level of scalability and security. All of the software tools, packages, and libraries provided to users through CoCalc, however, are open source. This has the usual benefits for users in that they don’t have to deal with “black box” secret computations or be locked into the CoCalc platform.

Figure 2. Compiling an R Markdown file.

The underlying environment for CoCalc projects is Ubuntu Linux. Users comfortable with shell programming may open multiple terminal sessions in the browser to their projects. Workflows can extend beyond the CoCalc platform to users’ remote systems using SSH integration techniques. It is also possible to install almost any package locally in a project.

Software packages are frequently updated. The CoCalc team tracks major releases and offers rolling updates of package sets.

Robust Data Storage

Should a user want to revert to previous versions or accidentally delete his or her files, CoCalc provides data-recovery methods and automatic backups. The system stores several hundred read-only snapshots of all the user’s files that are easy to browse, allowing one to recover older versions of files that might not have been edited via the web-editor (and instead by Vim or Emacs via a terminal), or obtain past versions of data files output from computations.

For those using the web editor, every change to a file is recorded and stored indefinitely on CoCalc’s servers at roughly a two-second resolution. This high definition file history can be viewed using TimeTravel, which contains a mode showing precisely the changes between two points in time, including information on the authors of the changes (see Figure 3). Since internet connections can be unreliable, CoCalc enables a user to continue editing offline; changes are merged into the live document upon reconnecting.

Figure 3. Edit history showing the difference between two versions.

Collaboration and Teaching

The owner of a CoCalc project may invite any number of collaborators. Users can collaborate in real time while editing virtually any file type. Each collaborator has a separate cursor that is visible to all users who have the file open at any given time. There is no explicit limit on the number of simultaneous users, which greatly enhances team effort. For example, both the client and server sides of the Jupyter notebook were completely rewritten in CoCalc for improved collaboration. Furthermore, video chat is available through the side chat panel.

CoCalc has been used for teaching courses at the university level for several years. Subjects include pure and applied mathematics, data science, physics, and bioinformatics. An integrated course management system is available for working with students in courses and workshops: an instructor can push an assignment file to all students and let them work on it, then collect it, grade it, and return the graded files. For finished results, any file or folder in a project may be shared (published). It is then viewable without a CoCalc login on a read-only basis.

Additional Resources

In summary, CoCalc offers an extensive open-source mathematical computing and authoring environment that supports several forms of collaboration and eliminates the work of installing software.

To learn more about CoCalc, visit the website and create a free account. Free accounts offer a platform for unlimited projects and collaborators, with 3GB of disk space per project. Upgrades, including more disk space, CPU power, and outside network access, incur charges. More information on pricing is available on the website, while the CoCalc wiki offers a wealth of tutorial and reference information. Articles about new features and system internals can be found on the CoCalc blog. The source code is on GitHub.

Historical note: CoCalc was previously known as SageMathCloud. It was renamed in May 2017.

Hal Snyder earned his M.S. in mathematics from the University of Chicago and his M.D. from Northwestern University. He has worked in operating systems internals and distributed systems for over 30 years, and is currently a senior software developer for CoCalc at Sagemath, Inc.

blog comments powered by Disqus