(index page)

My picks for #AGU13

Next week, the city of San Francisco will be overrun with nerds. More specifically,more than 22,000 geophysicists, oceanographers, geologists, seismologists, meteorologists, and volcanologists will be descending upon the Bay Area to attend the 2013 American Geophysical Union Fall Meeting.

If you are among the thousands of attendees, you are probably (like me) overwhelmed by the plethora in sessions, speakers, posters, and mixers. In an effort to force myself to look at the schedule well in advance of the actual meeting, I’m sharing my picks for must-sees at the AGU meeting below.

Note! I’m co-chairing “Managing Ecological Data for Effective Use and Reuse” along with Amber Budden of DataONE and Karthik Ram of rOpenSci. Prepare for a great set of talks about DMPTool, rOpenSci, DataONE, and others.

Session Title	Abbr	Type	Day	Time
Translating Science into Action: Innovative Services for the Geo- and Environmental- Sciences in the Era of Big Data I	GC11F	Oral	Mon	8:00 AM
Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science I	IN11D	Oral	Mon	8:00 AM
Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science II	IN12A	Oral	Mon	10:20 AM
Enabling Better Science Through Improving Science Software Development Culture I	IN22A	Oral	Tue	10:20 AM
Collaborative Frameworks and Experiences in Earth and Space Science Posters	IN23B	Poster	Tue	1:40 PM
Enabling Better Science Through Improving Science Software Development Culture II Posters	IN23C	Poster	Tue	1:40 PM
Managing Ecological Data for Effective Use and Reuse I	ED43E	Oral	Thu	1:40 PM
Open-Source Programming, Scripting, and Tools for the Hydrological Sciences II	H51R	Oral	Fri	8:00 AM
Data Stewardship in Theory and in Practice I	IN51D	Oral	Fri	8:00 AM
Managing Ecological Data for Effective Use and Reuse II Posters	ED53B	Poster	Fri	1:40 PM

Download the full program as a PDF

Previous Data Pub blog post about AGU: Scientific Data at AGU 2011

A forthcoming experiment in data publication

What we’re doing:

Like these dapper gentlemen, as small or as large as needed... From the Public Domain Review. — Like these dapper gentlemen, as small or as large as needed…
From The Public Domain Review.

Some time next year, the CDL will start an experiment in data publication. Our version of data publication will look like lightweight, non-peer reviewed dataset descriptions. These publications are designed to be flexible in structure and size. At a minimum, each document must have six elements:

Title
Creator(s)
Publisher
Publication year
Identifier (e.g.DOI or ARK)
Citation to the dataset

This bare bones document can expand to be richly descriptive, with optional items like subject keywords, version number, spatial or temporal range, collection methods, and as much description as the author cares to suppy.

Why we’re doing it:

The general agreement expressed in the recently released draft FORCE11 Declaration of Data Citation Principles –that datasets should be treated like “first class” research objects in how they are discovered, cited, and recognized– is still far from reality. Datasets are largely invisible to search engines, and authors rarely cite them formally.

A solution being implemented by a number of journals (e.g. Nature Scientific Data and Geoscience Data Journal
) is to publish proxy objects for discovery and citation called “data descriptors” or, more commonly, “data papers”. Data papers are formal scholarly publications that describe a dataset’s rationale and collection methods, but don’t analyze the data or draw any conclusions. Peer reviewers ensure that the paper contains all the information needed to use, re-use, or replicate the dataset.

The strength of the data paper approach– creators must write up rich and useful metadata to pass peer review– leads directly to the weakness: a data paper often takes more time and energy to produce than dataset creators are willing to invest. In a 2011 survey, researchers said that the biggest impediment to publishing data is lack of time. For researchers who manage to publish datasets but lack time to write and submit (and revise and resubmit) a data paper, we will provide some of the benefits of a data paper at none of the cost.

How we’re doing it:

We will publish these documents through EZID (easy-eye-dee), an identifier service that has supplied DataCite DOIs to over 167,000 datasets. All of the dataset metadata records have at least the five elements required by the DataCite metadata schema, more than 2,000 already have abstracts, and another 2,000 have other kinds of descriptive metadata. EZID will begin using dataset metadata to automatically generate publications that can be viewed as HTML in a web browser or as a dynamically generated PDF. The documents will be hosted by EZID in a format optimized for indexing by search engines like Google and Google Scholar.

Dataset creators won’t have to do anything to get a publication that they don’t already have to do to get a DOI. If the creator only fills in the required metadata, the document will function as a cover-sheet or landing page. If they submit an abstract and methods, the document expands to begin to look like a traditional journal article (while retaining the linking functionality of a landing page). It will capture as much effort as the researcher puts forth, whether that’s a lot or very little.

Do you have thoughts or comments on our idea? We would love to hear from you! Comment on this blog post or email us at uc3@ucop.edu.

My picks for #AGU13

Session Title

Abbr

Type

Day

Time

A forthcoming experiment in data publication

What we’re doing:

Why we’re doing it:

How we’re doing it: