Skip to content

Software Carpentry and Data Management

Posted in UC3

About a year ago, I started hearing about Software Carpentry. I wasn’t sure exactly what it was, but I envisioned tech-types showing up at your house with routers, hard drives, and wireless mice to repair whatever software was damaged by careless fumblings. Of course, this is completely wrong. I now know that it is actually an ambitious and awesome project that was recently adopted by Mozilla, and recently got a boost from the Alfred P. Sloan Foundation (how is it that they always seem to be involved in the interesting stuff?).

From their website:

Software Carpentry helps researchers be more productive by teaching them basic computing skills. We run boot camps at dozens of sites around the world, and also provide open access material online for self-paced instruction.

SWC got its start in 1990s, when its founder, Greg Wilson, realized that many of the scientists who were trying to use supercomputers didn’t actually know how to build and troubleshoot their code, much less use things like version control. More specifically, most had never been shown how to do four basic tasks that are fundamentally important to any science involving computation (which is increasingly all science):

  • growing a program from 10 to 100 to 100 lines without creating a mess
  • automating repetitive tasks
  • basic quality assurance
  • managing and sharing data and code
Software Carpentry is too cool for a reference to the Carpenters. From marshallmatlock.com (click for more).
Software Carpentry is too cool for a reference to the Carpenters. From marshallmatlock.com (click for more).

Greg started teaching these topics (and others) at Los Alamos National Laboratory in 1998. After a bit of stop and start, he left a faculty position at the University of Toronto in April 2010 to devote himself to it full-time. Fast forward to January 2012, and Software Carpentry became the first project of what is now the Mozilla Science Lab, supported by funding from the Alfred P. Sloan Foundation.

This new incarnation of Software Carpentry has focused on offering intensive, two-day workshops aimed at grad students and postdocs. These workshops (which they call “boot camps”) are usually small – typically 40 learners – with low student-teacher ratios, ensuring that those in attendance get the attention and help they need.

Other than Greg himself, whose role is increasingly to train new trainers, Software Carpentry is a volunteer organization. More than 50 people are currently qualified to instruct, and the number is growing steadily. The basic framework for a boot camp is this:

  1. Someone decides to host a Software Carpentry workshop for a particular group (e.g., a flock of macroecologists, or a herd of new graduate students at a particular university). This can be fellow researchers, department chairs, librarians, advisors — you name it.
  2. Organizers round up funds to pay for travel expenses for the instructors and any other anticipated workshop expenses.
  3. Software Carpentry matches them with instructors according to the needs of their group; together, they and the organizers choose dates and open up enrolment.
  4. The boot camp itself runs eight hours a day for two consecutive days (though there are occasionally variations). Learning is hands-on: people work on their own laptops, and see how to use the tools listed below to solve realistic problems.

That’s it! They have a great webpage on how to run a bootcamp, which includes checklists and thorough instructions on how to ensure your boot camp is a success. About 2300 people have gone through a SWC bootcamp, and the organization hopes to double that number by mid-2014.

The core curriculum for the two-day boot camp is usually:

Software Carpentry also offers over a hundred short video lessons online, all of which are CC-BY licensed  (go to the SWC webpage for a hyperlinked list):

  • Version Control
  • The Shell
  • Python
  • Testing
  • Sets and Dictionaries
  • Regular Expressions
  • Databases
  • Using Access
  • Data
  • Object-Oriented Programming
  • Program Design
  • Make
  • Systems Programming
  • Spreadsheets
  • Matrix Programming
  • MATLAB
  • Multimedia Programming
  • Software Engineering

Why focus on grad students and postdocs? They focus on graduate students and post-docs because professors are often too busy with teaching, committees, and proposal writing to improve their software skills, while undergrads have less incentive to learn since they don’t have a longer-term project in mind yet. They’re also playing a long game: today’s grad students are tomorrow’s professors, and the day after that, they will be the ones setting parameters for funding programs, editing journals, and shaping science in other ways. Teaching them these skills now is one way – maybe the only way – to make computational competence a “normal” part of scientific practice.

So why am I blogging about this? When Greg started thinking about training researchers to understand the basics of good computing practice and coding, he couldn’t have predicted that huge explosion in the availability of data, the number of software programs to analyze those datasets, and the shortage of training that researchers receive in dealing with this new era. I believe that part of the reason funders stepped up to help the mission of software caprentry is because now, more than ever, reseachers need these skills to successfully do science. Reproducibility and accountability are in more demand, and data sharing mandates will likely morph into workflow sharing mandates. Ensuring reproducibility in analysis is next to impossible without the skills Software Carpentry’s volunteers teach.

My secret motive for talking about SWC? I want UC librarians to start organizing bootcamps for groups of researchers on their campuses!

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *