Bridge Schemas
LinkML schemas for JGI and KBase genomics data lakehouses.
KBase/BERDL Lakehouse
Core Databases
| Schema |
Description |
Tables |
| Pangenome |
Pangenomic gene clusters with GTDB taxonomy |
11 |
| NMDC Core |
NMDC microbiome data (studies, biosamples, omics) |
25 |
| Genomes |
KBase genome assemblies |
- |
| GapMind Pathways |
Metabolic pathway completeness (463M+ records) |
1 |
Reference Databases
Project-Specific Databases
| Schema |
Description |
Organisms |
| ENIGMA CoRAL |
Groundwater microbiome data |
Environmental |
| PhageFoundry Browser |
Genome browsers for bacterial pathogens |
A. baumannii (891), K. pneumoniae (220), P. aeruginosa (535), P. viridiflava (259) |
| PhageFoundry Modelling |
Phage-host interaction modelling |
284 E. coli strains |
JGI Lakehouse
Core Databases
| Schema |
Database |
Tables |
Description |
| GOLD |
gold-db-2 |
374 |
Genomes OnLine Database - project metadata |
| IMG Core |
img-db-2 |
244 |
Integrated Microbial Genomes - core annotations |
| IMG Extended |
img-db-2 |
84 |
IMG extended data (pathways, secondary metabolites) |
| IMG GOLD |
img-db-2 |
118 |
IMG-GOLD integration tables |
| IMG Satellite |
img-db-2 |
141 |
Experimental and phenotype data |
| IMG Submission |
img-db-2 |
49 |
Genome submission system |
Specialty IMG Databases
IMG MySQL Databases
Organism Portals
| Schema |
Database |
Tables |
Description |
| Phytozome |
plant-db-7 |
306 |
Plant comparative genomics |
| MycoCosm |
myco-db-1/2/3 |
~150/genome |
Fungal comparative genomics |
Infrastructure
| Schema |
Database |
Tables |
Description |
| Citation Service |
gcs-vm-1 |
40 |
Genome citation tracking |
| JGI Portal |
portal-db-1 |
87 |
Job and download management |
| SDM Metadata |
sdm-db |
33 |
Scientific Data Management |
| SMC |
smc-db |
124 |
Secondary Metabolite Clusters |
Usage
These schemas can be used with linkml-store for type-safe querying of genomics data.
from linkml_store import Client
# Connect to KBase lakehouse
client = Client()
db = client.attach_database("dremio://...", schema="kbase_ke_pangenome")
# Query with schema validation
results = db.query("SELECT * FROM genome WHERE domain = 'Bacteria' LIMIT 10")
Schema Sources
- KBase schemas:
src/bridge_schemas/schema/kbase/
- JGI schemas:
src/bridge_schemas/schema/jgi/