Category: UC3

Posts written by UC3 staff.

Semantics and Data

There is a good reason that this post on semantics is directly preceded by a post on Ontologies and Data: I often get confused on the differences between the two.  This is probably because my left brain, which likes to clearly define, categorize, calculate, and organize, struggles with such right brain definitions like the study of the nature of existing in the case of ontology, and the study of meaning in the case of Semantics. Whaa? But I’m feeling a bit cocky due to the fact that the Ontologies post was the most-read DCXL post so far, so I will tackle semantics while I’m on a roll.


It can be difficult to interpret the meanings of words, even when the language is familiar. Used with permission from fundulus77 on Flickr

Semantics: unlike “ontology”, it’s a word we hear in popular media and in conversations with friends.  The colloquialism “It’s just semantics” is to imply that the difference between opinions is purely a verbal quibble, bearing no relationship to anything in the real world (D. Crystal, How Language Works).

Semantics has a different interpretation in the field of linguistics: it’s the study of meaning in language.  In data and informatics, it has to do with making sure data and information are machine-readable, and therefore in a set, common format and structure. The data and information can then interact with one anther in meaningful ways.  The most common way you might hear “semantic” in reference to information science is the semantic web.

The Semantic Web is a “collaborative effort” cooked up by the World Wide Web Consortium (W3C) that promotes common data formats on the web.  The goal is to

…provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries…  It enables machines to “understand” and respond to complex human requests based on their meaning, which requires that the relevant information sources is semantically structured.

Think of it as a universal language for data on the web.

There’s quite a bit of overlap in papers with keywords ontologies and semantics. If you need a concrete example based in Ecology, the Ecological Complexity paper “An ontology for landscapes” by Lepczyk, Lortie, and Anderson is a good one: 10.1016/j.ecocom.2008.04.001

…This ontology places the [concept of] ‘landscape’ within a broader logical relational ecological context in order to establish formal rules and consistent semantics so that individual researchers can continue to study organisms at select scales while others may potentially integrate results across scales.

Clear definitions of words, their meanings, and their relationships facilitates better science. This is even more true in the era of digital data and the world wide web. The more consistent scientists are with their descriptions, definitions, and data, the more likely it is that those data are useable in the future.  Semantics in science are all about being able to combine, compare, and relate disparate data. The goal is not to make all data uniform, rather to make it uniformly understandable.

The DCXL project comes into play with semantics in a similar way that it did with ontologies: ideally, we want to be consistent with efforts like the semantic web.  The underlying structure of Excel documents (.xlsx) is based in XML, or extensible markup language, which is designed for describing data in a machine-readable way. That’s a great first step.