Ontologies is one of those words I hear people toss about in conversations about computing, programming, and development. I usually nod and smile, pretending I know exactly what the word means, and how it relates to scientific data. It took some vigorous Google searching and a great discussion with M. Schildhauer of NCEAS before I can say, with confidence, that I kind-of understand the concept of ontologies.
In case you are in the same situation I was a few months ago, allow me to enlighten you. First, let’s start with the pre-computer era definition: ontology is the study of the nature of existing, the categories of being, and the relationships between these categories. Still not clear? Let’s let Wikipedia explain what the study of ontology entails:
Questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences.
I haven’t thought about the nature of existence since university-level philosophy courses, so this explanation makes my brain ache mildly. Remarkably, the computer science definition for ontology is slightly more tangible (and also sheds light on the descriptions above). In this field, an ontology is a set of concepts that represent the knowledge of a particular field of study (i.e. domain). It also includes the relationships between the concepts. Here’s examples of some important consequences of a field having an ontology:
- shared vocabulary and taxonomy
- explicitly defined concepts
- the relationships between different concepts
And Wikipedia provides an example that may help clarify things:
Particular meanings of terms applied to that domain are provided by domain ontology. For example the word card has many different meanings. An ontology about the domain of poker would model the “playing card” meaning of the word, while an ontology about the domain of computer hardware would model the “punched card” and “video card” meanings.
An important point to make is how vital ontologies are now for this era of international collaboration, data deluge, and digital data. Take the field of genetics. What if every geneticist decided on their own way to describe genes, proteins, and sequences? Furthermore, what if they used words other than “genes”, “proteins”, and “sequences” to describe these things? It would be incredibly difficult for the field to progress since no one is quite sure what anyone else is talking about in their research. A Gene Ontology has been established within the community to prevent this scenario from taking place.
There is much more to ontologies than standard vocabularies, but this is certainly the easiest ontology concept to grasp. In terms of the DCXL add-in, ontologies could be used to structure how Excel spreadsheets are formatted and coded to facilitate universal discoverability and usability. It’s not likely that the first version of the add-in will be able to accomodate a wide range of ontologies (i.e. domain-specific vocabularies), but we hope that future versions might find ways to direct users to standards used in their field of interest.