biostride-schema
BioStride is a comprehensive schema for representing multimodal structural biology imaging data, from atomic-resolution structures to tissue-level organization. It supports diverse experimental techniques including cryo-EM, X-ray crystallography, SAXS/SANS, fluorescence microscopy, and spectroscopic imaging.
Schema Organization
The schema follows a hierarchical structure that mirrors how structural biology research is organized:
The top-level entity is a Dataset, which serves as a container for related research. A dataset might represent all data from a specific grant, collaboration, or publication.
Each dataset contains one or more Studies, which are focused investigations of specific biological questions. For example, a study might investigate "Heat stress response in Arabidopsis" or "Structure of the human ribosome under different conditions."
Within each study, you'll find:
Biological Materials
-
Samples: The biological specimens being studied (proteins, nucleic acids, complexes, cells, tissues). Each sample includes detailed molecular composition, buffer conditions, and storage information. For example, a purified protein with its sequence, concentration, and buffer pH.
-
Sample Preparations: How samples were prepared for specific techniques. This includes cryo-EM grid preparation (vitrification parameters), crystallization conditions for X-ray studies, or staining protocols for fluorescence microscopy.
Data Collection
-
Instruments: The equipment used, from Titan Krios microscopes to synchrotron beamlines. Each instrument type (CryoEMInstrument, XRayInstrument, SAXSInstrument) has specific parameters like accelerating voltage, detector type, or beam energy.
-
Experiment Runs: Individual data collection sessions that link samples to instruments. An experiment run captures when, how, and under what conditions data was collected, including quality metrics like resolution and completeness.
Data Processing
- Workflow Runs: Computational processing steps applied to raw data. This includes motion correction for cryo-EM movies, 3D reconstruction, model building, or phase determination for crystallography. Each workflow tracks the software used, parameters, and computational resources.
Data Products
-
Data Files: Any files generated or used, from raw data to final models. Each file is tracked with checksums for data integrity and typed (micrograph, particles, volume, model).
-
Images: Specialized classes for different imaging modalities:
- Image2D: Micrographs, diffraction patterns
- Image3D: 3D reconstructions, tomograms
- FTIRImage: Molecular composition maps from infrared spectroscopy
- FluorescenceImage: Fluorophore-labeled cellular components
- OpticalImage: Brightfield/phase contrast microscopy
- XRFImage: Elemental distribution maps
Example Usage
A typical cryo-EM study of a protein complex would include: 1. Sample records for the purified complex with molecular weight and buffer composition 2. Grid preparation details with vitrification parameters 3. Microscope specifications and data collection parameters 4. Processing workflows from motion correction through 3D refinement 5. Final reconstructed volumes and fitted atomic models
A multimodal plant imaging study might combine: 1. Whole plant optical imaging for morphology 2. XRF imaging to map nutrient distribution 3. FTIR spectroscopy to identify stress-related molecular changes 4. Fluorescence microscopy to track specific protein responses 5. Cryo-EM of isolated organelles for ultrastructural details
Key Features
- Technique-agnostic core: The same schema handles data from any structural biology method
- Rich metadata: Comprehensive tracking from sample to structure
- Workflow provenance: Complete computational reproducibility
- Multimodal support: Seamlessly integrate data across scales and techniques
- Standards-compliant: Follows FAIR principles and integrates with existing ontologies
URI: https://w3id.org/biostride/
Name: biostride-schema
Classes
Class | Description |
---|---|
BufferComposition | Buffer composition for sample storage |
ComputeResources | Computational resources used |
DataCollectionStrategy | Strategy for data collection |
ExperimentalConditions | Environmental and experimental conditions |
ImageFeature | |
MolecularComposition | Molecular composition of a sample |
NamedThing | A named thing |
DataFile | A data file generated or used in the study |
Dataset | A collection of studies |
ExperimentRun | An experimental data collection session |
Image | An image file from structural biology experiments |
FTIRImage | Fourier Transform Infrared (FTIR) spectroscopy image capturing molecular comp... |
Image2D | A 2D image (micrograph, diffraction pattern) |
FluorescenceImage | Fluorescence microscopy image capturing specific molecular targets through fl... |
OpticalImage | Visible light optical microscopy or photography image |
XRFImage | X-ray fluorescence (XRF) image showing elemental distribution |
Image3D | A 3D volume or tomogram |
Instrument | An instrument used to collect data |
CryoEMInstrument | Cryo-EM microscope specifications |
SAXSInstrument | SAXS/WAXS instrument specifications |
XRayInstrument | X-ray diffractometer or synchrotron beamline specifications |
Sample | A biological sample used in structural biology experiments |
SamplePreparation | A process that prepares a sample for imaging |
Study | |
WorkflowRun | A computational processing workflow execution |
OntologyTerm | |
QualityMetrics | Quality metrics for experiments |
StorageConditions | Storage conditions for samples |
TechniqueSpecificPreparation | Base class for technique-specific preparation details |
CryoEMPreparation | Cryo-EM specific sample preparation |
SAXSPreparation | SAXS/WAXS specific preparation |
XRayPreparation | X-ray crystallography specific preparation |
Slots
Slot | Description |
---|---|
accelerating_voltage | Accelerating voltage in kV |
acquisition_date | Date image was acquired |
additives | Additional additives in the buffer |
apodization_function | Mathematical function used for apodization |
astigmatism | Astigmatism value |
atmosphere | Storage atmosphere conditions |
autoloader_capacity | Number of grids the autoloader can hold |
background_correction | Method used for background correction |
beam_energy | X-ray beam energy in keV |
beam_size | X-ray beam size in micrometers |
beam_size_max | Maximum beam size in micrometers |
beam_size_min | Minimum beam size in micrometers |
blot_force | Blotting force setting |
blot_time | Blotting time in seconds |
buffer_composition | Buffer composition including pH, salts, additives |
buffer_matching_protocol | Protocol for buffer matching |
calibration_standard | Reference standard used for calibration |
cell_path_length | Path length in mm |
chamber_temperature | Chamber temperature in Celsius |
channel_name | Name of the fluorescence channel (e |
checksum | SHA-256 checksum for data integrity |
collection_mode | Mode of data collection |
color_channels | Color channels present (e |
completed_at | Workflow completion time |
completeness | Data completeness percentage |
components | Buffer components and their concentrations |
compute_resources | Computational resources used |
concentration | Sample concentration in mg/mL or µM |
concentration_series | Concentration values for series measurements |
concentration_unit | Unit of concentration measurement |
contrast_method | Contrast enhancement method used |
cpu_hours | CPU hours used |
creation_date | File creation date |
cryoprotectant | Cryoprotectant used |
cryoprotectant_concentration | Cryoprotectant concentration percentage |
crystal_cooling_capability | Crystal cooling system available |
crystal_size | Crystal dimensions in micrometers |
crystallization_conditions | Detailed crystallization conditions |
crystallization_method | Method used for crystallization |
cs_corrector | Spherical aberration corrector present |
current_status | Current operational status |
data_collection_strategy | Strategy for data collection |
data_files | |
data_type | Type of data in the file |
definition | |
defocus | Defocus value in micrometers |
description | |
detector_dimensions | Detector dimensions in pixels (e |
detector_distance_max | Maximum detector distance in mm |
detector_distance_min | Minimum detector distance in mm |
detector_type | Type of detector |
dimensions_x | Image width in pixels |
dimensions_y | Image height in pixels |
dimensions_z | Image depth in pixels/slices |
dose | Electron dose in e-/Ų |
dose_per_frame | Dose per frame |
duration | Storage duration |
dwell_time | Dwell time per pixel in milliseconds |
elements_measured | Elements detected and measured |
emission_filter | Specifications of the emission filter |
emission_wavelength | Emission wavelength in nanometers |
energy_max | Maximum X-ray energy in keV |
energy_min | Minimum X-ray energy in keV |
excitation_filter | Specifications of the excitation filter |
excitation_wavelength | Excitation wavelength in nanometers |
experiment_code | Unique experiment identifier |
experiment_date | Date of the experiment |
experiment_id | Reference to the source experiment |
experimental_conditions | Environmental and experimental conditions |
exposure_time | Exposure time in seconds |
file_format | File format |
file_name | Name of the file |
file_path | Path to the file |
file_size_bytes | File size in bytes |
flash_cooling_method | Flash cooling protocol |
fluorophore | Name or type of fluorophore used |
flux | Photon flux in photons/second |
flux_density | Photon flux density in photons/s/mm² |
frame_rate | Frames per second |
goniometer_type | Type of goniometer |
gpu_hours | GPU hours used |
grid_type | Type of EM grid used |
hole_size | Hole size in micrometers |
humidity | Humidity percentage |
humidity_percentage | Chamber humidity during vitrification |
i_zero | Forward scattering intensity I(0) |
id | |
illumination_type | Type of illumination (brightfield, darkfield, phase contrast, DIC) |
images | |
installation_date | Date of instrument installation |
instrument_code | Unique identifier code for the instrument |
instrument_id | Reference to the instrument used |
instrument_runs | |
keywords | |
label | |
laser_power | Laser power in milliwatts or percentage |
ligands | Bound ligands or cofactors |
magnification | Optical magnification factor |
manufacturer | Instrument manufacturer |
memory_gb | Maximum memory used in GB |
model | Instrument model |
modifications | Post-translational modifications or chemical modifications |
molecular_composition | Description of molecular composition including sequences, modifications, liga... |
molecular_signatures | Identified molecular signatures or peaks |
molecular_weight | Molecular weight in kDa |
monochromator_type | Type of monochromator |
mounting_method | Crystal mounting method |
number_of_scans | Number of scans averaged for the spectrum |
numerical_aperture | Numerical aperture of the objective lens |
ontology | |
operator_id | Person who performed the preparation |
output_files | Output files generated |
parent_sample_id | Reference to parent sample for derivation tracking |
ph | pH of the buffer |
phase_plate | Phase plate available |
pinhole_size | Pinhole size in Airy units for confocal microscopy |
pixel_size | Pixel size in Angstroms |
pixel_size_max | Maximum pixel size in Angstroms per pixel |
pixel_size_min | Minimum pixel size in Angstroms per pixel |
plasma_treatment | Plasma treatment details |
preparation_date | Date of sample preparation |
preparation_method | Method used to prepare the sample |
preparation_type | Type of sample preparation |
pressure | Pressure in kPa |
processing_level | Processing level (0=raw, 1=corrected, 2=derived, 3=model) |
processing_parameters | Parameters used in processing |
processing_status | Current processing status |
protocol_description | Detailed protocol description |
purity_percentage | Sample purity as percentage |
q_range_max | Maximum q value in inverse Angstroms |
q_range_min | Minimum q value in inverse Angstroms |
quality_metrics | Quality control metrics for the sample |
quantum_yield | Quantum yield of the fluorophore |
r_factor | R-factor for crystallography |
raw_data_location | Location of raw data files |
reconstruction_method | Method used for 3D reconstruction |
resolution | Resolution in Angstroms |
rg | Radius of gyration in Angstroms |
sample_cell_type | Type of sample cell used |
sample_changer_capacity | Number of samples in automatic sample changer |
sample_code | Unique identifier code for the sample |
sample_id | Reference to the sample being prepared |
sample_preparations | |
sample_type | Type of biological sample |
samples | |
sequences | Amino acid or nucleotide sequences |
signal_to_noise | Signal to noise ratio |
software_name | Software used for processing |
software_version | Software version |
source_type | Type of X-ray source |
spectral_resolution | Spectral resolution in cm⁻¹ |
started_at | Workflow start time |
storage_conditions | Storage conditions for the sample |
storage_gb | Storage used in GB |
studies | |
support_film | Support film type |
technique | Technique used for data collection |
temperature | Storage temperature in Celsius |
temperature_control | Temperature control settings |
temperature_control_range | Temperature control range in Celsius |
temperature_unit | Temperature unit |
terms | |
title | |
total_dose | Total electron dose for cryo-EM |
total_frames | Total number of frames/images |
vitrification_method | Method used for vitrification |
voxel_size | Voxel size in Angstroms |
wavenumber_max | Maximum wavenumber in cm⁻¹ |
wavenumber_min | Minimum wavenumber in cm⁻¹ |
white_balance | White balance settings |
workflow_code | Unique workflow identifier |
workflow_runs | |
workflow_type | Type of processing workflow |
Enumerations
Enumeration | Description |
---|---|
CollectionModeEnum | Data collection modes |
ConcentrationUnitEnum | Units for concentration measurement |
CrystallizationMethodEnum | Methods for protein crystallization |
DataTypeEnum | Types of data |
DetectorTypeEnum | Types of detectors for cryo-EM |
FileFormatEnum | File formats |
GridTypeEnum | Types of EM grids |
IlluminationTypeEnum | Types of illumination for optical microscopy |
InstrumentStatusEnum | Operational status of instruments |
PreparationTypeEnum | Types of sample preparation |
ProcessingStatusEnum | Processing status |
SampleTypeEnum | Types of biological samples |
TechniqueEnum | Structural biology techniques |
TemperatureUnitEnum | Units for temperature measurement |
VitrificationMethodEnum | Methods for vitrification |
WorkflowTypeEnum | Types of processing workflows |
XRaySourceTypeEnum | Types of X-ray sources |
Types
Type | Description |
---|---|
Boolean | A binary (true or false) value |
Curie | a compact URI |
Date | a date (year, month and day) in an idealized calendar |
DateOrDatetime | Either a date or a datetime |
Datetime | The combination of a date and time |
Decimal | A real number with arbitrary precision that conforms to the xsd:decimal speci... |
Double | A real number that conforms to the xsd:double specification |
Float | A real number that conforms to the xsd:float specification |
Integer | An integer |
Jsonpath | A string encoding a JSON Path |
Jsonpointer | A string encoding a JSON Pointer |
Ncname | Prefix part of CURIE |
Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model |
Objectidentifier | A URI or CURIE that represents an object in the model |
Sparqlpath | A string encoding a SPARQL Property Path |
String | A character string |
Time | A time object represents a (local) time of day, independent of any particular... |
Uri | a complete URI |
Uriorcurie | a URI or a CURIE |
Subsets
Subset | Description |
---|---|