Love at First Cite

CDL UC3, November 4, 2011

Posted in: UC3

Data citation. This is a phrase you are likely to hear a lot in the next few years. The idea is simple enough: cite a data set, just like you would a journal article. Note: much of the content from this article was borrowed from Robert Cook of Oak Ridge National Laboratory and the DataCite website.

Why should you care about data citation? Here are] a few reasons:

Researchers can easily find data products associated with a publication. If you’ve ever tried to re-use data from someone’s publication, you know how difficult it can be to find the raw data. Sometimes it involves using programs to generate data points from figures or tables in the PDF (I am not linking to any of these programs since I don’t think this is a good technique to employ). Other times you might have to contact the author directly to ask for their files. In general, it can be very time consuming and frustrating, and often results in failure to obtain the data. If data are cited properly, finding data products associated with a publication would be much easier. As the data provider, you have the added advantage of not needing to respond to any requests for your data; interested researchers can find it easily because you cited it.
You get credit for your data AND your publications. Often the time it takes to write a paper for publication is only a small fraction of the time it took to collect the underlying data. If that’s the case, it would be great to get some credit for the actual data collection. You can put it on your CV in a section called “Data”. You can also include data in the set that wasn’t used for publication but might still be usable by others.
Your data are discoverable via Web of Science. If you archive your data in a repository and get a digital object identifier, or DOI, for it (see Step 3 below), you can get citation metrics for your data AND your publications.
You are allowing reproducibility of your results. Don’t be afraid of producing your data and analyses: if others are convinced that what you did is valid and reproducible, your clout as a researcher is sure to be high.

Data citation involves three steps on the part of the researcher:

Prepare your data so you can archive it (see the Best Practices tab for more information). This includes documenting your data, i.e. creating metadata, and preparing your data for long-term storage.
Put your data somewhere. Ideally this would be in a long-term stable archive or data center (there’s a list of repositories available on the DataCite website), but it can also be on your departmental website, your personal website, or as supplemental material on a journal’s website.
Tell people how to cite and use your data. You can provide an example reference that includes typical information like your name, the year of the data set, the name of the data set, and where it is located. If you put your data in a repository, you can get a digital object identifier (available through services such as CDL’s EZID project), which can provide a way for others to find your data well into the future.

The concept of data citation is right in line with the DCXL project’s goals. One of the potential features for the add-in is to enable links to CDL’s EZID for DOI generation. Another is to prompt the user for creating good metadata, which is critical for making data citable.

Love at first cite — Love at First Cite Tat: nerdy or awesome?