Skip to content

biostride-schema

BioStride is a comprehensive schema for representing multimodal structural biology imaging data, from atomic-resolution structures to tissue-level organization. It supports diverse experimental techniques including cryo-EM, X-ray crystallography, SAXS/SANS, fluorescence microscopy, and spectroscopic imaging.

Schema Organization

The schema follows a hierarchical structure that mirrors how structural biology research is organized:

The top-level entity is a Dataset, which serves as a container for related research. A dataset might represent all data from a specific grant, collaboration, or publication.

Each dataset contains one or more Studies, which are focused investigations of specific biological questions. For example, a study might investigate "Heat stress response in Arabidopsis" or "Structure of the human ribosome under different conditions."

Within each study, you'll find:

Biological Materials

  • Samples: The biological specimens being studied (proteins, nucleic acids, complexes, cells, tissues). Each sample includes detailed molecular composition, buffer conditions, and storage information. For example, a purified protein with its sequence, concentration, and buffer pH.

  • Sample Preparations: How samples were prepared for specific techniques. This includes cryo-EM grid preparation (vitrification parameters), crystallization conditions for X-ray studies, or staining protocols for fluorescence microscopy.

Data Collection

  • Instruments: The equipment used, from Titan Krios microscopes to synchrotron beamlines. Each instrument type (CryoEMInstrument, XRayInstrument, SAXSInstrument) has specific parameters like accelerating voltage, detector type, or beam energy.

  • Experiment Runs: Individual data collection sessions that link samples to instruments. An experiment run captures when, how, and under what conditions data was collected, including quality metrics like resolution and completeness.

Data Processing

  • Workflow Runs: Computational processing steps applied to raw data. This includes motion correction for cryo-EM movies, 3D reconstruction, model building, or phase determination for crystallography. Each workflow tracks the software used, parameters, and computational resources.

Data Products

  • Data Files: Any files generated or used, from raw data to final models. Each file is tracked with checksums for data integrity and typed (micrograph, particles, volume, model).

  • Images: Specialized classes for different imaging modalities:

  • Image2D: Micrographs, diffraction patterns
  • Image3D: 3D reconstructions, tomograms
  • FTIRImage: Molecular composition maps from infrared spectroscopy
  • FluorescenceImage: Fluorophore-labeled cellular components
  • OpticalImage: Brightfield/phase contrast microscopy
  • XRFImage: Elemental distribution maps

Example Usage

A typical cryo-EM study of a protein complex would include: 1. Sample records for the purified complex with molecular weight and buffer composition 2. Grid preparation details with vitrification parameters 3. Microscope specifications and data collection parameters 4. Processing workflows from motion correction through 3D refinement 5. Final reconstructed volumes and fitted atomic models

A multimodal plant imaging study might combine: 1. Whole plant optical imaging for morphology 2. XRF imaging to map nutrient distribution 3. FTIR spectroscopy to identify stress-related molecular changes 4. Fluorescence microscopy to track specific protein responses 5. Cryo-EM of isolated organelles for ultrastructural details

Key Features

  • Technique-agnostic core: The same schema handles data from any structural biology method
  • Rich metadata: Comprehensive tracking from sample to structure
  • Workflow provenance: Complete computational reproducibility
  • Multimodal support: Seamlessly integrate data across scales and techniques
  • Standards-compliant: Follows FAIR principles and integrates with existing ontologies

URI: https://w3id.org/biostride/

Name: biostride-schema

Classes

Class Description
BufferComposition Buffer composition for sample storage
ComputeResources Computational resources used
DataCollectionStrategy Strategy for data collection
ExperimentalConditions Environmental and experimental conditions
ImageFeature
MolecularComposition Molecular composition of a sample
NamedThing A named thing
        DataFile A data file generated or used in the study
        Dataset A collection of studies
        ExperimentRun An experimental data collection session
        Image An image file from structural biology experiments
                FTIRImage Fourier Transform Infrared (FTIR) spectroscopy image capturing molecular comp...
                Image2D A 2D image (micrograph, diffraction pattern)
                        FluorescenceImage Fluorescence microscopy image capturing specific molecular targets through fl...
                        OpticalImage Visible light optical microscopy or photography image
                        XRFImage X-ray fluorescence (XRF) image showing elemental distribution
                Image3D A 3D volume or tomogram
        Instrument An instrument used to collect data
                CryoEMInstrument Cryo-EM microscope specifications
                SAXSInstrument SAXS/WAXS instrument specifications
                XRayInstrument X-ray diffractometer or synchrotron beamline specifications
        Sample A biological sample used in structural biology experiments
        SamplePreparation A process that prepares a sample for imaging
        Study
        WorkflowRun A computational processing workflow execution
OntologyTerm
QualityMetrics Quality metrics for experiments
StorageConditions Storage conditions for samples
TechniqueSpecificPreparation Base class for technique-specific preparation details
        CryoEMPreparation Cryo-EM specific sample preparation
        SAXSPreparation SAXS/WAXS specific preparation
        XRayPreparation X-ray crystallography specific preparation

Slots

Slot Description
accelerating_voltage Accelerating voltage in kV
acquisition_date Date image was acquired
additives Additional additives in the buffer
apodization_function Mathematical function used for apodization
astigmatism Astigmatism value
atmosphere Storage atmosphere conditions
autoloader_capacity Number of grids the autoloader can hold
background_correction Method used for background correction
beam_energy X-ray beam energy in keV
beam_size X-ray beam size in micrometers
beam_size_max Maximum beam size in micrometers
beam_size_min Minimum beam size in micrometers
blot_force Blotting force setting
blot_time Blotting time in seconds
buffer_composition Buffer composition including pH, salts, additives
buffer_matching_protocol Protocol for buffer matching
calibration_standard Reference standard used for calibration
cell_path_length Path length in mm
chamber_temperature Chamber temperature in Celsius
channel_name Name of the fluorescence channel (e
checksum SHA-256 checksum for data integrity
collection_mode Mode of data collection
color_channels Color channels present (e
completed_at Workflow completion time
completeness Data completeness percentage
components Buffer components and their concentrations
compute_resources Computational resources used
concentration Sample concentration in mg/mL or µM
concentration_series Concentration values for series measurements
concentration_unit Unit of concentration measurement
contrast_method Contrast enhancement method used
cpu_hours CPU hours used
creation_date File creation date
cryoprotectant Cryoprotectant used
cryoprotectant_concentration Cryoprotectant concentration percentage
crystal_cooling_capability Crystal cooling system available
crystal_size Crystal dimensions in micrometers
crystallization_conditions Detailed crystallization conditions
crystallization_method Method used for crystallization
cs_corrector Spherical aberration corrector present
current_status Current operational status
data_collection_strategy Strategy for data collection
data_files
data_type Type of data in the file
definition
defocus Defocus value in micrometers
description
detector_dimensions Detector dimensions in pixels (e
detector_distance_max Maximum detector distance in mm
detector_distance_min Minimum detector distance in mm
detector_type Type of detector
dimensions_x Image width in pixels
dimensions_y Image height in pixels
dimensions_z Image depth in pixels/slices
dose Electron dose in e-/Ų
dose_per_frame Dose per frame
duration Storage duration
dwell_time Dwell time per pixel in milliseconds
elements_measured Elements detected and measured
emission_filter Specifications of the emission filter
emission_wavelength Emission wavelength in nanometers
energy_max Maximum X-ray energy in keV
energy_min Minimum X-ray energy in keV
excitation_filter Specifications of the excitation filter
excitation_wavelength Excitation wavelength in nanometers
experiment_code Unique experiment identifier
experiment_date Date of the experiment
experiment_id Reference to the source experiment
experimental_conditions Environmental and experimental conditions
exposure_time Exposure time in seconds
file_format File format
file_name Name of the file
file_path Path to the file
file_size_bytes File size in bytes
flash_cooling_method Flash cooling protocol
fluorophore Name or type of fluorophore used
flux Photon flux in photons/second
flux_density Photon flux density in photons/s/mm²
frame_rate Frames per second
goniometer_type Type of goniometer
gpu_hours GPU hours used
grid_type Type of EM grid used
hole_size Hole size in micrometers
humidity Humidity percentage
humidity_percentage Chamber humidity during vitrification
i_zero Forward scattering intensity I(0)
id
illumination_type Type of illumination (brightfield, darkfield, phase contrast, DIC)
images
installation_date Date of instrument installation
instrument_code Unique identifier code for the instrument
instrument_id Reference to the instrument used
instrument_runs
keywords
label
laser_power Laser power in milliwatts or percentage
ligands Bound ligands or cofactors
magnification Optical magnification factor
manufacturer Instrument manufacturer
memory_gb Maximum memory used in GB
model Instrument model
modifications Post-translational modifications or chemical modifications
molecular_composition Description of molecular composition including sequences, modifications, liga...
molecular_signatures Identified molecular signatures or peaks
molecular_weight Molecular weight in kDa
monochromator_type Type of monochromator
mounting_method Crystal mounting method
number_of_scans Number of scans averaged for the spectrum
numerical_aperture Numerical aperture of the objective lens
ontology
operator_id Person who performed the preparation
output_files Output files generated
parent_sample_id Reference to parent sample for derivation tracking
ph pH of the buffer
phase_plate Phase plate available
pinhole_size Pinhole size in Airy units for confocal microscopy
pixel_size Pixel size in Angstroms
pixel_size_max Maximum pixel size in Angstroms per pixel
pixel_size_min Minimum pixel size in Angstroms per pixel
plasma_treatment Plasma treatment details
preparation_date Date of sample preparation
preparation_method Method used to prepare the sample
preparation_type Type of sample preparation
pressure Pressure in kPa
processing_level Processing level (0=raw, 1=corrected, 2=derived, 3=model)
processing_parameters Parameters used in processing
processing_status Current processing status
protocol_description Detailed protocol description
purity_percentage Sample purity as percentage
q_range_max Maximum q value in inverse Angstroms
q_range_min Minimum q value in inverse Angstroms
quality_metrics Quality control metrics for the sample
quantum_yield Quantum yield of the fluorophore
r_factor R-factor for crystallography
raw_data_location Location of raw data files
reconstruction_method Method used for 3D reconstruction
resolution Resolution in Angstroms
rg Radius of gyration in Angstroms
sample_cell_type Type of sample cell used
sample_changer_capacity Number of samples in automatic sample changer
sample_code Unique identifier code for the sample
sample_id Reference to the sample being prepared
sample_preparations
sample_type Type of biological sample
samples
sequences Amino acid or nucleotide sequences
signal_to_noise Signal to noise ratio
software_name Software used for processing
software_version Software version
source_type Type of X-ray source
spectral_resolution Spectral resolution in cm⁻¹
started_at Workflow start time
storage_conditions Storage conditions for the sample
storage_gb Storage used in GB
studies
support_film Support film type
technique Technique used for data collection
temperature Storage temperature in Celsius
temperature_control Temperature control settings
temperature_control_range Temperature control range in Celsius
temperature_unit Temperature unit
terms
title
total_dose Total electron dose for cryo-EM
total_frames Total number of frames/images
vitrification_method Method used for vitrification
voxel_size Voxel size in Angstroms
wavenumber_max Maximum wavenumber in cm⁻¹
wavenumber_min Minimum wavenumber in cm⁻¹
white_balance White balance settings
workflow_code Unique workflow identifier
workflow_runs
workflow_type Type of processing workflow

Enumerations

Enumeration Description
CollectionModeEnum Data collection modes
ConcentrationUnitEnum Units for concentration measurement
CrystallizationMethodEnum Methods for protein crystallization
DataTypeEnum Types of data
DetectorTypeEnum Types of detectors for cryo-EM
FileFormatEnum File formats
GridTypeEnum Types of EM grids
IlluminationTypeEnum Types of illumination for optical microscopy
InstrumentStatusEnum Operational status of instruments
PreparationTypeEnum Types of sample preparation
ProcessingStatusEnum Processing status
SampleTypeEnum Types of biological samples
TechniqueEnum Structural biology techniques
TemperatureUnitEnum Units for temperature measurement
VitrificationMethodEnum Methods for vitrification
WorkflowTypeEnum Types of processing workflows
XRaySourceTypeEnum Types of X-ray sources

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
Ncname Prefix part of CURIE
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE

Subsets

Subset Description