Streamlining Research-Based Collaboration with Authorea

By Alberto Pepe

Inspiration from the Ancestral Home of the World Wide Web

The modern scientific process presents a puzzling paradox: months or years of cutting-edge research is written up with antiquated word processing software and packaged into a flat PDF for a journal to stick behind a paywall after peer review. Has the world wide web failed to deliver a better mechanism for researchers to widely spread their ideas? And where is the source data? What about the dynamic models? In the age of cloud-services-for-everything, why do we need to print out a figure and lay a ruler on top to approximate the slope of a curve?

This is what I was thinking while working with some of my colleagues at the European Organization for Nuclear Research (CERN) several years ago. We experienced firsthand the labor of writing a paper with more than one hundred other physicists; there was no reasonable way to collaborate on such a production, much less share the true depth and beauty of our data with a larger audience. We used Dropbox to sync versions of data throughout the writing process, and emailed our coauthors when we finished reviewing a section. We maintained a working version of the document that we updated with a filename of the form “v23_final.txt” every few days. We also verified that a data set was up-to-date by scrolling through repositories and browsing by “date last modified.” It was a mess – and for many researchers, especially those who work overwhelmingly with data sets, it still is.

The First Solution Presents Itself

Software developers have almost universally integrated version control into their collaborative development workflows. The most popular version control system is called Git, initially developed by Linus Torvalds, the original creator of the Linux operating system. We were using Git for much of our data management back at CERN, and it quickly became clear how useful version control would be for all kinds of documents, including research. We reasoned that whenever a researcher made a change to a document, the change could be logged and visible as a Diff through time – exactly how it’s done in software development. All of the biggest internet companies in the world assemble large and often-distributed teams of developers to work on the same code this way; why couldn’t we use this method for research content?

Authorea was born to let researchers collaborate on writing in real-time. We built the initial framework for Authorea on top of Git because we wanted researchers to have a centralized system that would simultaneously reveal the latest authoritative version of a document to everyone involved. We also wanted to build a system that would incorporate–behind the scenes–all of the products that are lost upon publication: notebooks, data, analysis, and code; Git is a perfect system on which to build a rich media repository as a data layer underneath a document.

We built the rest of the document editor from the initial inspiration and structure. Our community (astrophysicists and physicists) works almost exclusively in LaTeX, so we added the ability to write in different markup languages–LaTeX, Markdown, and regular richtext–in a document at the same time. Today, researchers are sharing data sets, uploading iPython notebooks, embedding javascript visualizations, and syncing their documents to GitHub from within the document itself. It’s very exciting.

Screenshot of Authorea editor. Image courtesy of Alberto Pepe.

The Researcher Gains Control of Output and Dissemination

PDFs are great for printing but bad for nearly everything else. If research is ultimately about disseminating findings and advancing science, it’s critically important that researchers gain more control of the dissemination and publication process to take advantage of modern communication methods.

Writing a document on Authorea also creates a web-native version of the document that renders in beautiful HTML5 on any internet-connected device. What’s more, anything written on Authorea is issued a unique URL which can be set to public or private, and documents can have a digital object identifier (DOI) with the click of a button.

Researchers can submit documents directly to hundreds of journals and make links between organizations and storage repositories. It’s our hope that more researchers will begin promoting their pre-printed research and collaborating more seamlessly online.

Where We Go from Here

More than 75,000 researchers in fields from astrophysics to zoology currently use Authorea. Researchers use it because it helps make collaborating faster and easier. As more researchers turn to disseminating their work in the post-PDF world, we have high hopes that they will use these tools to improve collaboration, publication time, and reproducibility of results.

Alberto Pepe is the co-founder of Authorea, an online platform to write research collaboratively. He is a “recovering academic” with previous Ph.D. and postdoc work in astrophysics and information science. He holds degrees and fellowships from Harvard University, the University of California, Los Angeles, the European Organization for Nuclear Research (CERN), and University College London. He was born and raised in the wine-making town of Manduria, in Puglia, Southern Italy.
