Skip to content

Generating RDF from Phenopackets

One of the advantages of LinkML is automated conversion between different serialization forms of Phenopackets:

  • YAML/JSON
  • RDF
  • SQL Databases

In this notebook we'll demonstrate conversion to RDF. We will use LinkML conversion libraries via Python, but all this can also be done via the command line.

Import libraries

We will use the (autogenerated) Phenopacket python classes, plus the LinkML loaders and dumpers.

See Data Conversion in the LinkML docs for more information.

from phenopackets.datamodel.phenopackets import Phenopacket
from linkml_runtime.loaders import json_loader

We'll use the generic LinkML json_loader:

pkt = json_loader.load("../../../examples/Phenopacket-retinoblastoma.json", Phenopacket)

Let's print out the features as a basic sanity check

for pf in pkt.phenotypicFeatures:
    print(f" {pf.type.id} {pf.type.label}")
 HP:0030084 Clinodactyly
 HP:0000555 Leukocoria
 HP:0000486 Strabismus
 HP:0000541 Retinal detachment

Dumping to RDF

We will first import the RDF dumper. This is a generic dumper from LinkML - there is no phenopackets-specific dumper.

We will also need to use the SchemaView library to load the Phenopackets schema, as the RDF dumper nees this metadata.

from phenopackets.datamodel import MAIN_SCHEMA_PATH
from linkml_runtime import SchemaView

sv = SchemaView(str(MAIN_SCHEMA_PATH))

We will also provide a prefix map

prefix_map = {
    "HP": "http://purl.obolibrary.org/obo/HP_",
    "ex": "https://example.org/",
    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
    "NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
    "NCIT": "http://purl.obolibrary.org/obo/NCIT_",
    "EFO": "http://www.ebi.ac.uk/efo/EFO_",
    "PMID": "https://www.ncbi.nlm.nih.gov/pubmed/",
    "OMIM": "http://omim.org/entry/",
    "GENO": "http://purl.obolibrary.org/obo/GENO_",
    "ECO": "http://purl.obolibrary.org/obo/ECO_",
    "DOI": "https://doi.org/",
    "DrugCentral": "http://identifiers.org/drugcentral/",
    "LOINC": "https://loinc.org/",
    "UCUM": "http://unitsofmeasure.org/",
    "PATO": "http://purl.obolibrary.org/obo/PATO_",
    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
    "SCTID": "http://snomed.info/id/",
    "SNOMEDCT": "http://snomed.info/id/",
    "UO": "http://purl.obolibrary.org/obo/UO_",
    "__base": "https://example.org/base/",
}

from linkml_runtime.dumpers import rdflib_dumper

print(rdflib_dumper.dumps(pkt, schemaview=sv, prefix_map=prefix_map))
@prefix DrugCentral: <http://identifiers.org/drugcentral/> .
@prefix GENO: <http://purl.obolibrary.org/obo/GENO_> .
@prefix HP: <http://purl.obolibrary.org/obo/HP_> .
@prefix LOINC: <https://loinc.org/> .
@prefix NCIT: <http://purl.obolibrary.org/obo/NCIT_> .
@prefix UBERON: <http://purl.obolibrary.org/obo/UBERON_> .
@prefix UCUM: <http://unitsofmeasure.org/> .
@prefix ns1: <measurement:> .
@prefix ns10: <vrs:> .
@prefix ns11: <individual:> .
@prefix ns2: <meta_data:> .
@prefix ns3: <phenotypic_feature:> .
@prefix ns4: <base:> .
@prefix ns5: <interpretation:> .
@prefix ns6: <biosample:> .
@prefix ns7: <vrsatile:> .
@prefix ns8: <disease:> .
@prefix ns9: <medical_action:> .
@prefix phenopackets: <https://w3id.org/linkml/phenopackets/phenopackets/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

DrugCentral:1678 a ns4:OntologyClass ;
    ns4:label "melphalan" .

GENO:0000135 a ns4:OntologyClass ;
    ns4:label "heterozygous" .

HP:0000486 a ns4:OntologyClass ;
    ns4:label "Strabismus" .

HP:0000541 a ns4:OntologyClass ;
    ns4:label "Retinal detachment" .

HP:0000555 a ns4:OntologyClass ;
    ns4:label "Leukocoria" .

HP:0012834 a ns4:OntologyClass ;
    ns4:label "Right" .

HP:0025637 a ns4:OntologyClass ;
    ns4:label "Vasospasm" .

HP:0030084 a ns4:OntologyClass ;
    ns4:label "Clinodactyly" .

NCIT:C10894 a ns4:OntologyClass ;
    ns4:label "Carboplatin/Etoposide/Vincristine" .

NCIT:C132485 a ns4:OntologyClass ;
    ns4:label "Apoptosis and Necrosis" .

NCIT:C140678 a ns4:OntologyClass ;
    ns4:label "Retinoblastoma cM0 TNM Finding v8" .

NCIT:C140711 a ns4:OntologyClass ;
    ns4:label "Retinoblastoma pN0 TNM Finding v8" .

NCIT:C140720 a ns4:OntologyClass ;
    ns4:label "Retinoblastoma pT3 TNM Finding v8" .

NCIT:C35941 a ns4:OntologyClass ;
    ns4:label "Flexner-Wintersteiner Rosette Formation" .

NCIT:C38222 a ns4:OntologyClass ;
    ns4:label "Intraarterial Route of Administration" .

NCIT:C41331 a ns4:OntologyClass ;
    ns4:label "Adverse Event" .

NCIT:C64576 a ns4:OntologyClass ;
    ns4:label "Once" .

NCIT:C8509 a ns4:OntologyClass ;
    ns4:label "Primary Neoplasm" .

UBERON:0000970 a ns4:OntologyClass ;
    ns4:label "eye" .

UCUM:mg.kg-1 a ns4:OntologyClass ;
    ns4:label "milligram per kilogram" .

UCUM:mm a ns4:OntologyClass ;
    ns4:label "millimeter" .

LOINC:33728-7 a ns4:OntologyClass ;
    ns4:label "Size.maximum dimension in Tumor" .

LOINC:79892-6 a ns4:OntologyClass ;
    ns4:label "Right eye Intraocular pressure" .

LOINC:79893-4 a ns4:OntologyClass ;
    ns4:label "Left eye Intraocular pressure" .

LOINC:LA24739-7 a ns4:OntologyClass ;
    ns4:label "Group E" .

NCIT:C48601 a ns4:OntologyClass ;
    ns4:label "Enucleation" .

<http://unitsofmeasure.org/mm[Hg]> a ns4:OntologyClass ;
    ns4:label "millimetres of mercury" .

LOINC:56844-4 a ns4:OntologyClass ;
    ns4:label "Intraocular pressure of Eye" .

HP:0012835 a ns4:OntologyClass ;
    ns4:label "Left" .

NCIT:C62220 a ns4:OntologyClass ;
    ns4:label "Cure" .

UBERON:0004548 a ns4:OntologyClass ;
    ns4:label "left eye" .

NCIT:C7541 a ns4:OntologyClass ;
    ns4:label "Retinoblastoma" .

[] a phenopackets:Phenopacket ;
    phenopackets:biosamples [ a ns6:Biosample ;
            ns6:files [ a ns4:File ;
                    ns4:uri "file://data/fileSomaticWgs.vcf.gz" ] ;
            ns6:id "biosample.1" ;
            ns6:measurements [ a ns1:Measurement ;
                    ns1:assay LOINC:33728-7 ;
                    ns1:timeObserved [ a ns4:TimeElement ;
                            ns4:age [ a ns4:Age ;
                                    ns4:iso8601duration "P8M2W" ] ] ;
                    ns1:value [ a ns1:Value ;
                            ns1:quantity [ a ns1:Quantity ;
                                    ns1:unit UCUM:mm ;
                                    ns1:value 1.5e+01 ] ] ] ;
            ns6:pathologicalTnmFinding NCIT:C140711,
                NCIT:C140720 ;
            ns6:phenotypicFeatures [ a ns3:PhenotypicFeature ;
                    ns3:excluded false ;
                    ns3:type NCIT:C132485 ],
                [ a ns3:PhenotypicFeature ;
                    ns3:excluded false ;
                    ns3:type NCIT:C35941 ] ;
            ns6:procedure [ a ns4:Procedure ;
                    ns4:bodySite UBERON:0004548 ;
                    ns4:code NCIT:C48601 ;
                    ns4:performed [ a ns4:TimeElement ;
                            ns4:age [ a ns4:Age ;
                                    ns4:iso8601duration "P8M2W" ] ] ] ;
            ns6:sampledTissue UBERON:0000970 ;
            ns6:tumorProgression NCIT:C8509 ] ;
    phenopackets:diseases [ a ns8:Disease ;
            ns8:clinicalTnmFinding NCIT:C140678 ;
            ns8:diseaseStage LOINC:LA24739-7 ;
            ns8:excluded false ;
            ns8:onset [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P4M" ] ] ;
            ns8:primarySite UBERON:0004548 ;
            ns8:term NCIT:C7541 ] ;
    phenopackets:files [ a ns4:File ;
            ns4:uri "file://data/germlineWgs.vcf.gz" ] ;
    phenopackets:id "arbitrary.id" ;
    phenopackets:interpretations [ a ns5:Interpretation ;
            ns5:diagnosis [ a ns5:Diagnosis ;
                    ns5:disease NCIT:C7541 ;
                    ns5:genomicInterpretations [ a ns5:GenomicInterpretation ;
                            ns5:interpretationStatus "CAUSATIVE" ;
                            ns5:subjectOrBiosampleId "proband A" ;
                            ns5:variantInterpretation [ a ns5:VariantInterpretation ;
                                    ns5:acmgPathogenicityClassification "PATHOGENIC" ;
                                    ns5:therapeuticActionability "ACTIONABLE" ;
                                    ns5:variationDescriptor [ a ns7:VariationDescriptor ;
                                            ns7:extensions [ a ns7:Extension ;
                                                    ns7:name "mosaicism" ;
                                                    ns7:value "40.0%" ] ;
                                            ns7:id "variant-id" ;
                                            ns7:moleculeContext "unspecified_molecule_context" ;
                                            ns7:variation [ a ns10:Variation ;
                                                    ns10:copyNumber [ a ns10:CopyNumber ;
                                                            ns10:derivedSequenceExpression [ a ns10:DerivedSequenceExpression ;
                                                                    ns10:location [ a ns10:SequenceLocation ;
                                                                            ns10:sequenceId "refseq:NC_000013.14" ;
                                                                            ns10:sequenceInterval [ a ns10:SequenceInterval ;
                                                                                    ns10:endNumber [ a ns10:Number ;
                                                                                            ns10:value 61706822 ] ;
                                                                                    ns10:startNumber [ a ns10:Number ;
                                                                                            ns10:value 25981249 ] ] ] ;
                                                                    ns10:reverseComplement false ] ;
                                                            ns10:number [ a ns10:Number ;
                                                                    ns10:value 1 ] ] ] ] ] ],
                        [ a ns5:GenomicInterpretation ;
                            ns5:interpretationStatus "CAUSATIVE" ;
                            ns5:subjectOrBiosampleId "biosample.1" ;
                            ns5:variantInterpretation [ a ns5:VariantInterpretation ;
                                    ns5:acmgPathogenicityClassification "PATHOGENIC" ;
                                    ns5:therapeuticActionability "ACTIONABLE" ;
                                    ns5:variationDescriptor [ a ns7:VariationDescriptor ;
                                            ns7:allelicState GENO:0000135 ;
                                            ns7:expressions [ a ns7:Expression ;
                                                    ns7:syntax "hgvs.c" ;
                                                    ns7:value "NM_000321.2:c.958C>T" ],
                                                [ a ns7:Expression ;
                                                    ns7:syntax "transcript_reference" ;
                                                    ns7:value "NM_000321.2" ] ;
                                            ns7:extensions [ a ns7:Extension ;
                                                    ns7:name "allele-frequency" ;
                                                    ns7:value "25.0%" ] ;
                                            ns7:geneContext [ a ns7:GeneDescriptor ;
                                                    ns7:symbol "RB1" ;
                                                    ns7:valueId "HGNC:9884" ] ;
                                            ns7:id "rs121913300" ;
                                            ns7:label "RB1 c.958C>T (p.Arg320Ter)" ;
                                            ns7:moleculeContext "genomic" ;
                                            ns7:variation [ a ns10:Variation ;
                                                    ns10:allele [ a ns10:Allele ;
                                                            ns10:literalSequenceExpression [ a ns10:LiteralSequenceExpression ;
                                                                    ns10:sequence "T" ] ;
                                                            ns10:sequenceLocation [ a ns10:SequenceLocation ;
                                                                    ns10:sequenceId "refseq:NC_000013.11" ;
                                                                    ns10:sequenceInterval [ a ns10:SequenceInterval ;
                                                                            ns10:endNumber [ a ns10:Number ;
                                                                                    ns10:value 48367512 ] ;
                                                                            ns10:startNumber [ a ns10:Number ;
                                                                                    ns10:value 48367511 ] ] ] ] ] ;
                                            ns7:vcfRecord [ a ns7:VcfRecord ;
                                                    ns7:alt "T" ;
                                                    ns7:chrom "NC_000013.11" ;
                                                    ns7:genomeAssembly "GRCh38" ;
                                                    ns7:pos 48367512 ;
                                                    ns7:ref "C" ] ] ] ] ] ;
            ns5:id "interpretation.id" ;
            ns5:progressStatus "SOLVED" ] ;
    phenopackets:measurements [ a ns1:Measurement ;
            ns1:assay LOINC:79892-6 ;
            ns1:timeObserved [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P6M" ] ] ;
            ns1:value [ a ns1:Value ;
                    ns1:quantity [ a ns1:Quantity ;
                            ns1:referenceRange [ a ns1:ReferenceRange ;
                                    ns1:high 2.1e+01 ;
                                    ns1:low 1e+01 ;
                                    ns1:unit LOINC:56844-4 ] ;
                            ns1:unit <http://unitsofmeasure.org/mm[Hg]> ;
                            ns1:value 1.5e+01 ] ] ],
        [ a ns1:Measurement ;
            ns1:assay LOINC:79893-4 ;
            ns1:timeObserved [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P6M" ] ] ;
            ns1:value [ a ns1:Value ;
                    ns1:quantity [ a ns1:Quantity ;
                            ns1:referenceRange [ a ns1:ReferenceRange ;
                                    ns1:high 2.1e+01 ;
                                    ns1:low 1e+01 ;
                                    ns1:unit LOINC:56844-4 ] ;
                            ns1:unit <http://unitsofmeasure.org/mm[Hg]> ;
                            ns1:value 2.5e+01 ] ] ] ;
    phenopackets:medicalActions [ a ns9:MedicalAction ;
            ns9:adverseEvents HP:0025637 ;
            ns9:treatment [ a ns9:Treatment ;
                    ns9:agent DrugCentral:1678 ;
                    ns9:doseIntervals [ a ns9:DoseInterval ;
                            ns9:interval [ a ns4:TimeInterval ;
                                    ns4:end "2020-09-02T00:00:00Z" ;
                                    ns4:start "2020-09-02T00:00:00Z" ] ;
                            ns9:quantity [ a ns1:Quantity ;
                                    ns1:unit UCUM:mg.kg-1 ;
                                    ns1:value 4e-01 ] ;
                            ns9:scheduleFrequency NCIT:C64576 ] ;
                    ns9:drugType "UNKNOWN_DRUG_TYPE" ;
                    ns9:routeOfAdministration NCIT:C38222 ] ;
            ns9:treatmentIntent NCIT:C62220 ;
            ns9:treatmentTarget NCIT:C7541 ;
            ns9:treatmentTerminationReason NCIT:C41331 ],
        [ a ns9:MedicalAction ;
            ns9:procedure [ a ns4:Procedure ;
                    ns4:bodySite UBERON:0004548 ;
                    ns4:code NCIT:C48601 ;
                    ns4:performed [ a ns4:TimeElement ;
                            ns4:age [ a ns4:Age ;
                                    ns4:iso8601duration "P8M2W" ] ] ] ;
            ns9:treatmentIntent NCIT:C62220 ;
            ns9:treatmentTarget NCIT:C7541 ],
        [ a ns9:MedicalAction ;
            ns9:therapeuticRegimen [ a ns9:TherapeuticRegimen ;
                    ns9:endTime [ a ns4:TimeElement ;
                            ns4:age [ a ns4:Age ;
                                    ns4:iso8601duration "P8M" ] ] ;
                    ns9:ontologyClass NCIT:C10894 ;
                    ns9:regimenStatus "COMPLETED" ;
                    ns9:startTime [ a ns4:TimeElement ;
                            ns4:age [ a ns4:Age ;
                                    ns4:iso8601duration "P7M" ] ] ] ;
            ns9:treatmentIntent NCIT:C62220 ;
            ns9:treatmentTarget NCIT:C7541 ] ;
    phenopackets:metaData [ a ns2:MetaData ;
            ns2:created "2021-05-14T10:35:00Z" ;
            ns2:createdBy "anonymous biocurator" ;
            ns2:phenopacketSchemaVersion "2.0.0" ;
            ns2:resources [ a ns2:Resource ;
                    ns2:id "drugcentral" ;
                    ns2:iriPrefix "https://drugcentral.org/drugcard/" ;
                    ns2:name "Drug Central" ;
                    ns2:namespacePrefix "DrugCentral" ;
                    ns2:url "https://drugcentral.org/" ;
                    ns2:version "2022-08-22" ],
                [ a ns2:Resource ;
                    ns2:id "ncit" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/NCIT_" ;
                    ns2:name "NCI Thesaurus" ;
                    ns2:namespacePrefix "NCIT" ;
                    ns2:url "http://purl.obolibrary.org/obo/ncit.owl" ;
                    ns2:version "21.05d" ],
                [ a ns2:Resource ;
                    ns2:id "loinc" ;
                    ns2:iriPrefix "https://loinc.org/" ;
                    ns2:name "Logical Observation Identifiers Names and Codes" ;
                    ns2:namespacePrefix "LOINC" ;
                    ns2:url "https://loinc.org" ;
                    ns2:version "2.73" ],
                [ a ns2:Resource ;
                    ns2:id "ucum" ;
                    ns2:iriPrefix "https://units-of-measurement.org/" ;
                    ns2:name "Unified Code for Units of Measure" ;
                    ns2:namespacePrefix "UCUM" ;
                    ns2:url "https://ucum.org" ;
                    ns2:version "2.1" ],
                [ a ns2:Resource ;
                    ns2:id "geno" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/GENO_" ;
                    ns2:name "Genotype Ontology" ;
                    ns2:namespacePrefix "GENO" ;
                    ns2:url "http://purl.obolibrary.org/obo/geno.owl" ;
                    ns2:version "2022-03-05" ],
                [ a ns2:Resource ;
                    ns2:id "ncbitaxon" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/NCBITaxon_" ;
                    ns2:name "NCBI organismal classification" ;
                    ns2:namespacePrefix "NCBITaxon" ;
                    ns2:url "http://purl.obolibrary.org/obo/ncbitaxon.owl" ;
                    ns2:version "2021-06-10" ],
                [ a ns2:Resource ;
                    ns2:id "uberon" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/UBERON_" ;
                    ns2:name "Uber-anatomy ontology" ;
                    ns2:namespacePrefix "UBERON" ;
                    ns2:url "http://purl.obolibrary.org/obo/uberon.owl" ;
                    ns2:version "2021-07-27" ],
                [ a ns2:Resource ;
                    ns2:id "efo" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/EFO_" ;
                    ns2:name "Experimental Factor Ontology" ;
                    ns2:namespacePrefix "EFO" ;
                    ns2:url "http://www.ebi.ac.uk/efo/efo.owl" ;
                    ns2:version "3.34.0" ],
                [ a ns2:Resource ;
                    ns2:id "hp" ;
                    ns2:iriPrefix "http://purl.obolibrary.org/obo/HP_" ;
                    ns2:name "human phenotype ontology" ;
                    ns2:namespacePrefix "HP" ;
                    ns2:url "http://purl.obolibrary.org/obo/hp.owl" ;
                    ns2:version "2022-06-11" ] ] ;
    phenopackets:phenotypicFeatures [ a ns3:PhenotypicFeature ;
            ns3:excluded false ;
            ns3:modifiers HP:0012835 ;
            ns3:onset [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P6M" ] ] ;
            ns3:type HP:0000541 ],
        [ a ns3:PhenotypicFeature ;
            ns3:excluded false ;
            ns3:modifiers HP:0012834 ;
            ns3:onset [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P3M" ] ] ;
            ns3:type HP:0030084 ],
        [ a ns3:PhenotypicFeature ;
            ns3:excluded false ;
            ns3:modifiers HP:0012835 ;
            ns3:onset [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P4M" ] ] ;
            ns3:type HP:0000555 ],
        [ a ns3:PhenotypicFeature ;
            ns3:excluded false ;
            ns3:modifiers HP:0012835 ;
            ns3:onset [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P5M15D" ] ] ;
            ns3:type HP:0000486 ] ;
    phenopackets:subject [ a ns11:Individual ;
            ns11:id "proband A" ;
            ns11:karyotypicSex "XX" ;
            ns11:sex "FEMALE" ;
            ns11:timeAtLastEncounter [ a ns4:TimeElement ;
                    ns4:age [ a ns4:Age ;
                            ns4:iso8601duration "P6M" ] ] ] .