Skip to content

Bridge Schemas

LinkML schemas for JGI and KBase genomics data lakehouses.

KBase/BERDL Lakehouse

Core Databases

Schema Description Tables
Pangenome Pangenomic gene clusters with GTDB taxonomy 11
NMDC Core NMDC microbiome data (studies, biosamples, omics) 25
Genomes KBase genome assemblies -
GapMind Pathways Metabolic pathway completeness (463M+ records) 1

Reference Databases

Schema Description
UniProt Archaea UniProt archaeal proteins
UniProt Bacteria UniProt bacterial proteins
UniRef50/90/100 UniRef clustered sequences
RefSeq Taxon RefSeq taxonomy data
MSD Biochemistry ModelSEED biochemistry
Ontology Source Ontology reference data
Phenotype Phenotype data

Project-Specific Databases

Schema Description Organisms
ENIGMA CoRAL Groundwater microbiome data Environmental
PhageFoundry Browser Genome browsers for bacterial pathogens A. baumannii (891), K. pneumoniae (220), P. aeruginosa (535), P. viridiflava (259)
PhageFoundry Modelling Phage-host interaction modelling 284 E. coli strains

JGI Lakehouse

Core Databases

Schema Database Tables Description
GOLD gold-db-2 374 Genomes OnLine Database - project metadata
IMG Core img-db-2 244 Integrated Microbial Genomes - core annotations
IMG Extended img-db-2 84 IMG extended data (pathways, secondary metabolites)
IMG GOLD img-db-2 118 IMG-GOLD integration tables
IMG Satellite img-db-2 141 Experimental and phenotype data
IMG Submission img-db-2 49 Genome submission system

Specialty IMG Databases

Schema Database Tables Description
IMG Taxonomy img-db-2 8 Taxonomy data
IMG Methylome img-db-2 10 Methylome experiments
IMG Proteome img-db-2 15 Proteomics data
IMG RNAseq img-db-2 11 RNA-seq experiments
IMG Development img-db-2 254 Development database

IMG MySQL Databases

Schema Database Tables Description
IMG MySQL ABC img-db-1 18 ABC transporter data
IMG MySQL Core img-db-1 5 Core IMG tables
IMG/VR img-db-1 7 Viral genomes
IMG MySQL MBin img-db-1 17 Metagenome binning
IMG MySQL MISI img-db-1 5 Microbial signatures

Organism Portals

Schema Database Tables Description
Phytozome plant-db-7 306 Plant comparative genomics
MycoCosm myco-db-1/2/3 ~150/genome Fungal comparative genomics

Infrastructure

Schema Database Tables Description
Citation Service gcs-vm-1 40 Genome citation tracking
JGI Portal portal-db-1 87 Job and download management
SDM Metadata sdm-db 33 Scientific Data Management
SMC smc-db 124 Secondary Metabolite Clusters

Usage

These schemas can be used with linkml-store for type-safe querying of genomics data.

from linkml_store import Client

# Connect to KBase lakehouse
client = Client()
db = client.attach_database("dremio://...", schema="kbase_ke_pangenome")

# Query with schema validation
results = db.query("SELECT * FROM genome WHERE domain = 'Bacteria' LIMIT 10")

Schema Sources

  • KBase schemas: src/bridge_schemas/schema/kbase/
  • JGI schemas: src/bridge_schemas/schema/jgi/