Skip to main content

Software for Reproducibility Part 2: The Tools

CDL UC3,

Last week I wrote about the workshop I attended (Workshop on Software Infrastructure for Reproducibility in Science), held in Brooklyn at the new Center for Urban Science and Progress, NYU. This workshop was made possible by the Alfred P. Sloan Foundation and brought together heavy-hitters from the reproducibility world who work on software for workflows. I provided some broad-strokes overviews last week; this week, I’ve created a list of some of the tools we saw during the workshop. Note: the level of detail for tools is consistent with my level of fatigue during their presentation!

Sumatra

From the Sumatra website:

The solution we propose is to develop a core library, implemented as a Python package, sumatra, and then to develop a series of interfaces that build on top of this: a command-line interface, a web interface, a graphical interface. Each of these interfaces will enable: (1) launching simulations/analyses with automated recording of provenance information; sand (2) managing a computational project: browsing, viewing, deleting simulations/analyses.

Taverna

From the Taverna website:

Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.

IPython Notebook

Galaxy

From their website:

Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.

Madagascar

From the Madagascar website:

Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide

  • a convenient and powerful environment
  • a convenient technology transfer tool

for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system.

VisTrails

RCloud

ReproZip 

Open Science Framework

From the OSF website:

The Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist’s workflow and help increase the alignment between scientific values and scientific practices.

RunMyCode

From the RunMyCode website:

RunMyCode is a novel cloud-based platform that enables scientists to openly share the code and data that underlie their research publications. This service is based on the innovative concept of a companion website associated with a scientific publication. The code is run on a computer cloud server and the results are immediately displayed to the user.

Dexy

From their website:

Dexy lets you to continue to use your favorite documentation tools, while getting more out of them than ever, and being able to combine them in new and powerful ways. With Dexy you can bring your scripting and testing skills into play in your documentation, bringing project automation and integration to new levels.

DuraSpace 

Dataverse 

From their website:

A repository for research data that takes care of long term preservation and good archival practices, while researchers can share, keep control of and get recognition for their data.