Event-driven digital object architecture for natural science collection

Why Event?

What got me thinking about “events” is the following questions: how do we deal with heterogeneous and distributed data sources and create services around them that are sustainable, scalable, and interoperable? And more importantly, can we think about data atomicity? So that we can focus on the smallest unit that can provide us the building blocks of the system. Can this smallest unit be algorithm, language, and schema-agnostic? Can we re-imagine our vast data landscape in a granular form and think about a sequence of events with a particular workflow? I don’t yet have answers to these questions, but hopefully, the ideas and examples here would help me and others to think about the solutions (or apply to existing solutions).

Imagine building a system to understand events related to global changes? Biodiversity data is just one aspect. Image source: https://cleanet.org/clean/literacy/tools/UGC/infographic.html

What is an Event?

First thing first: An event is “a significant change in state.” Put simply, something that happened in the past: a user viewed a page, a user clicked a button, etc. From a system and data perspective — when some actor acts on an entity in a domain-specific context. An excellent example of this is git — where all commits are stored and lets you figure out what happened. In this model, the idea of entities and events are intertwined concepts. To build the data model, we need to understand the actors that were involved in the event.

What is an Entity?

An entity can be a person or object (digital or physical) that is involved in an event. For example, our entities are scientists, museums, and organisms (entities could be digital content as well, such as pdfs, images, etc.). Events are actions such as “a scientist identifies a species, collects specimen or deposits the specimen in a museum.” It could also be updating a record such as updating the name and address of the museum. We can describe these actions in this form: (very similar to RDF triple): “Scientist A collected specimen X ” and “Scientist A deposited specimen X in Museum Z.”

Entity-> verb ->targetEntity with “some extra information/properties”. Example: Scientist collects/identifies species. The user views digital specimen information.

Let’s create some entities

I create “Digital Object,” as defined by the Digital Object Architecture for this example. I use Cordra to create these objects in JSON, but it could be demonstrated with other tools as well. A digital object (DO) is a “sequence of bits” and “having as an essential element an associated unique persistent identifier.”

{
"id": "test/b0ac8fc9596372bc3c97",
"name": "Thirteenth Doctor"
}
{
"id": "test/cae42177c14a8fcdeb14",
"scientificName": "Homunculus Loxodontus"
}
{
"id": "test/a49a51ac540d68915229",
"InstName": "Museum of Broken Relationships",
"website": "https://en.wikipedia.org/wiki/Museum_of_Broken_Relationships"
}

Let’s create some events

Now, I want to create a few event schemas that can help me capture various actions of The Doctor. These schema could be as generic as create and update records but could also be as specific as collection and deposit. Here is an example of an event that captures the action “The Thirteenth Doctor Collected a specimen of an organism.” The “id” here is the PID of the event. We know this is a collection event and the actor was a scientist. Also, the target here was the specimen (or sample) of the organism. Various other properties could be recorded during this collection event.

{
"id": "test/98499997ff0a30ab444d",
"timestamp": "2019-11-05T11:22:03.524Z",
"event": "collection",
"entityType": "scientist",
"entityID": "test/b0ac8fc9596372bc3c97",
"targetEntityType": "specimen",
"targetEntityId": "test/b0ac8fc9596372bc3c97",
}
{
"id": "test/c4942d87a9f89d8929c1",
"timestamp": "2019-11-05T13:22:03.524Z",
"event": "deposit",
"entityType": "scientist",
"entityID": "test/b0ac8fc9596372bc3c97",
"objectType": "specimen",
"objectID": "ABC-123-445",
"targetEntityType": "museum",
"targetEntityId": "test/a49a51ac540d68915229",
"depositDate": "11/29/2010"
}
{
"id": "test/bexo841bce6ef0116d5",
"timestamp": "2019-11-05T14:22:03.524Z",
"event": "creteDS",
"entityType": "scientist",
"entityID: "test/b0ac8fc9596372bc3c97",
"targetEntityType": "digitalspecimen",
"targetEntityID": "test/db44501292f3e4c35f8e"
}
{
"id": "test/db44501292f3e4c35f8e",
"scientificName": "Homunculus Loxodontus",
"physicalSpecimenId": "ABC-123-445",
"depositedIn": "test/a49a51ac540d68915229",
"collectedBy": "test/b0ac8fc9596372bc3c97",
"depositedBy": "test/b0ac8fc9596372bc3c97",
"dscreatedBy: "test/b0ac8fc9596372bc3c97"
}

How did we get here?

What’s the sequence of events which led us to create this digital specimen? We can now create an aggregated log based on all the events and answer that question. We now have audit trails and view the event trail in a tabular format:

Conclusion

So you may ask, couldn’t you create a relational database and log all these? Yes. And that is one of the outputs that can come out of this model. But can the relational CRUD model provide the same flexibility?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sharif Islam

Sharif Islam

Data Architect@Distributed System of Scientific Collections (https://dissco.eu). PhD in Sociology. Bachelor's in Math and CS from the University of Illinois.