The primary objective of this schema is to provide a data model for representing chemical entities and their groupings, where these are database instances, and to use this for aligning across different chemical databases.
A secondary objective is to be able to generate an ontology from the datamodel, and to use this to help advance the development of CHEBI.
See Makefile.etl for specific details.
The basic idea is to transform the turtle instance data (where for
carbon is an instance of the ChemicalElement class, and
carbon-12 is an instance of the sibling Isotope class) into classes,
and to use reasoning to classify.
Currently this is done via SPARQL construct (see owlgen folder but later this will be done by generating equivalence axioms from compound keys in the database.
Mn(+4) is represented in the database as an individual of type MonoatomicIon
chem:MonoatomicIon/Mn/+4 rdf:type chem:MonoatomicIon ; rdfs:label "manganese(4+)" ; ns1:chebi_iri CHEBI:25158 ; ns1:charge 4 ; ns1:has_element chem:Mn ; ns1:inchi_string "InChI=1S/Mn/q+4"^^xsd:string .
This is translated to class-level (via this query):
chem:MonoatomicIon/Mn/+4 a owl:Class ; rdfs:label "manganese(4+)" ; owl:equivalentClass [ owl:intersectionOf ( chem:ChemicalElement/Mn [ a owl:Restriction ; owl:hasValue 4 ; owl:onProperty chem:charge ] ) ] .
This will autoclassify to "manganese ion" etc
here is an example of the atom hierarchy in protege, showing automatic classification:
One thing that may seem unintuitive is that instances at the LinkML level are classes at the OBO level. This is illustrated here:
Relationship to templating systems
One way to view this project is:
- the schema is a hierarchical collection of DOSDP templates or ROBOT templates
- the database are the TSVs/spreadsheets that are inputs to the templates to generate OWL
Using LinkML as the modeling system provides some advantages. Rather than a collection of denormalized tables, the inputs to the OWL generation are objects/instances/rows conforming to a full object model/schema, allowing for both rigorous modeling and powerful programmatic transformations.