The DCXL project is in full swing now– developers are working closely with Microsoft Research to create the add-in that will revolutionize scientific data curation (in my humble opinion!). Part of this process was deciding how to handle metadata. For a refresher on metadata, i.e. data documentation, read this post about the metadata in DCXL.
Creating metadata was one of the major requirements for the project, and arguably the most challenging task. The challenges stem from the fact that there are many metadata standards out there, and of course, none are perfect for our particular task. So how do we incorporate good work done by others much smarter than me into DCXL, without compromising our need for user-friendly, simple data documentation?
It was tricky, but we came up with a solution that will work for many, if not most, potential DCXL users. A few things entered into the metadata route we chose:
- DataONE: We are very interested in making sure that data posted to a repository via the DCXL add-in can be found using the DataONE Mercury metadata search system (Called ONE-Mercury; to be released in May). That means we need to make sure we are using metadata that the DataONE infrastructure likes. At this point in DataONE’s development, that limits us to International Organization for Standardization Geospatial Metadata Standard (ISO19115), Federal Geographic Data Committee Geospatial Metadata Standard (FGDC) , and Ecological Metadata Language (EML).
- We want metadata created by the DCXL software to be as flexible as possible for as many different types of data as possible. ISO19115 and FGDC are both geared towards spatial data specifically (e.g., GIS). EML is a bit more general and flexible, so we chose to go with it.
- EML is a very well documented metadata schema; rather than include every element of EML in DCXL, we cherry-picked the elements we thought would generate metadata that makes the data more discoverable and useable. Of course, just like never being too skinny or too rich, you can NEVER have too much metadata. But we chose to draw the line somewhere between “not useful at all” and “overwhelming”.
- We ensured that the metadata elements we included could be mapped to DataCite and Dublin Core minimal metadata. This ensures that a data citation can be generated based on the metadata collected for the dataset.