(index page)
Putting the Meta in Metadata

The DCXL project is in full swing now– developers are working closely with Microsoft Research to create the add-in that will revolutionize scientific data curation (in my humble opinion!). Part of this process was deciding how to handle metadata. For a refresher on metadata, i.e. data documentation, read this post about the metadata in DCXL.
Creating metadata was one of the major requirements for the project, and arguably the most challenging task. The challenges stem from the fact that there are many metadata standards out there, and of course, none are perfect for our particular task. So how do we incorporate good work done by others much smarter than me into DCXL, without compromising our need for user-friendly, simple data documentation?
It was tricky, but we came up with a solution that will work for many, if not most, potential DCXL users. A few things entered into the metadata route we chose:
- DataONE: We are very interested in making sure that data posted to a repository via the DCXL add-in can be found using the DataONE Mercury metadata search system (Called ONE-Mercury; to be released in May). That means we need to make sure we are using metadata that the DataONE infrastructure likes. At this point in DataONE’s development, that limits us to International Organization for Standardization Geospatial Metadata Standard (ISO19115), Federal Geographic Data Committee Geospatial Metadata Standard (FGDC) , and Ecological Metadata Language (EML).
- We want metadata created by the DCXL software to be as flexible as possible for as many different types of data as possible. ISO19115 and FGDC are both geared towards spatial data specifically (e.g., GIS). EML is a bit more general and flexible, so we chose to go with it.
- EML is a very well documented metadata schema; rather than include every element of EML in DCXL, we cherry-picked the elements we thought would generate metadata that makes the data more discoverable and useable. Of course, just like never being too skinny or too rich, you can NEVER have too much metadata. But we chose to draw the line somewhere between “not useful at all” and “overwhelming”.
- We ensured that the metadata elements we included could be mapped to DataCite and Dublin Core minimal metadata. This ensures that a data citation can be generated based on the metadata collected for the dataset.
Survey says…
A few weeks ago we reached out to the scientific community for help on the direction of the DCXL project. The major issue at hand was whether we should develop a web-based application or an add-in for Microsoft Excel. Last week, I reported that we decided that rather than choose, we will develop both. This might seem like a risky proposition: the DCXL project has a one-year timeline, meaning this all needs to be developed before August (!). As someone in a DCXL meeting recently put it, aren’t we settling for “twice the product and half the features”? We discussed what features might need to be dropped from our list of desirables based on the change in trajectory, however we are confident that both of the DCXL products we develop will be feature-rich and meet the needs of the target scientific community. Of course, this is made easier by the fact that the features in the two products will be nearly identical.

How did we arrive at developing an add-in and a web app? By talking to scientists. It became obvious that there were aspects of both products that appeal to our user communities based on feedback we collected. Here’s a summary of what we heard:
Show of hands: I ran a workshop on Data Management for Scientists at the Ocean Sciences 2012 Meeting in February. At the close of the workshop, I described the DCXL project and went over the pros and cons of the add-in option and the web app option. By show of hands, folks in the audience voted about 80% for the web app (n~150)
Conversations: here’s a sampling of some of the things folks told me about the two options:
- “I don’t want to go to the web. It’s much easier if it’s incorporated into Excel.” (add-in)
- “As long as I can create metadata offline, I don’t mind it being a web app. It seems like all of the other things it would do require you to be online anyway” (either)
- “If there’s a link in the spreadsheet, that seems sufficient. (either) It would be better to have something that stays on the menu bar no matter what file is open.” (Add-in)
- “The updates are the biggest issue for me. If I have to update software a lot, I get frustrated. It seems like Microsoft is always making update something. I would rather go to the web and know it’s the most recent version.” (web app)
- Workshop attendee: “Can it work like Zotero, where there’s ways to use it both offline and online?” (both)
Survey: I created a very brief survey using the website SurveyMonkey. I then sent the link to the survey out via social media and listservs. Within about a week, I received over 200 responses.
Education level of respondents:
Survey questions & answers:
So with those results, there was a resounding “both!” emanating from the scientific community. First we will develop the add-in since it best fits the needs of our target users (those who use Excel heavily and need assistance with good data management skills). We will then develop the web application, with the hope that the community at large will adopt and improve on the web app over time. The internet is a great place for building a community with shared needs and goals– we can only hope that DCXL will be adopted as wholeheartedly as other internet sources offering help and information.
Hooray for Progress!
Great news on the DCXL front! We are moving forward with the Excel add-in and will have something to share with the community this summer. If you missed it, back in January the DCXL project had an existential crisis: add-in or web-based application? I posted on the subject here and here. We spent a lot of time talking to the community and collating feedback, weighing the pros and cons of each option, and carefully considering how best to proceed with the DCXL project.
And the conclusion we came to… let’s develop both!
Comparing web-based applications and add-ins (aka plug-ins) is really an apples-and-oranges comparison. How could we discount that a web-based application is yet another piece of software for scientists to learn? Or that an add-in is only useful for Excel spreadsheets running a Windows operating system? Instead, we have chosen to first create an add-in (this was the original intent of the project), then move that functionality to a web-based application that will have more flexibility for the longer term.

The capabilities of the add-in and the web-based application will be similar: we are still aiming to create metadata, check the data file for .csv compatibility, generate a citation, and upload the data set to a data repository. For a full read of the requirements (updated last week), check out the Requirements page on this site. The implementation of these requirements might be slightly different, but the goals of the DCXL project will be met in both cases: we will facilitate good data management, data archiving, and data sharing.
It’s true that the DCXL project is running a bit behind schedule, but we believe that it will be possible to create the two prototypes before the end of the summer. Check back here for updates on our progress.
Help Wanted: Add-in versus Web Application?
I recently updated this site with a page listing the DCXL Requirements. These five requirements are the basic feature set and capabilities we would like have for the Excel Add-in that is to be developed in the course of the project. The engineering team at Microsoft Research checked out our requirements and had a (rather surprising) suggestion: instead of an add-in, they recommended a web-based application.
Add-ins are little pieces of software that you can download to extend the capabilities of a program – in our case, Microsoft Excel. Synonyms for add-inare plug-in and add-on. They are downloaded, installed, and then appear within a specific program. An add-in for Excel would appear in the Excel “ribbon”, and would add new features to Excel.
A web-based application is something a bit different. It’s a software system designed to support “machine-to-machine interaction over a network”. Web applications require the web (shocking, I know) and do not require that you download a program. Instead, you use an internet connection and the web-based application. Basically, these are web sites that do more than just display information – they do something with the information or files provided by the user, on the user’s behalf. Web sites such as Facebook, YouTube, and SkyDrive are examples of web applications.
So I turn to you, community: what are your thoughts on this? Make your voice heard! You can email me directly, comment on the blog below, or come on down to CDL‘s Downtown Oakland office and tell me in person. But please comment quickly – this decision needs to be made soon. You can also vote using the quick poll in the sidebar to the right of this post. We want to know what you think!
To help you formulate intelligent comments, here’s a rough comparison of the two options:
Add-in: The user would download the add-in for use on the current machine. They could perform the above tasks via a new “ribbon” that appears at the top of the Excel window. They would be able to perform the above tasks on their current spreadsheet.
Web application: The user would go to the website hosting the web application. They would upload (drag-and-drop) a spreadsheet to the site. They could then perform the above tasks to the spreadsheet. The spreadsheet could then be downloaded back onto their PC.
| Office Add-In | Web-Based Application |
|
| Platform Compatibility | Windows only | Any |
| Spreadsheet compatibility | Different add-in for each Excel version | One application covers multiple versions; potential future expansion to SQL, CSV, XML, Open Office, GoogleDocs etc |
| Download necessary? | Yes | No |
| Software updates | Fixed bugs require download & re-install | No download/re-install necessary |
| Cloud-based? | No | Yes |
| Offline use? | Yes | No; potential future for HTML5 and offline use |
| Languages | C#/.NET C/C++ | HTML/JavaScript C#/ASP.NET |
| Has all the functionality of Excel | Yes | No |
And here are the basic capabilities we want, regardless of which of the two options above becomes a reality:
- Must work for Excel users without the add-in
- No additional software (other than add-in and Excel) necessary
- Can be used offline
- Perform CSV compatibility checks, reporting, and automated fixes
- Add Metadata to data file
- Can use existing metadata as a template
- Add-in can automatically generate some of the metadata where the info is available from the file
- Generate a citation for the data file
- Deposit data and metadata in a repository
Download the complete requirements as a PDF: DCXL Requirements





