REMBI Alignment: Integrating Biological Imaging Metadata Standards with BioStride
Executive Summary
REMBI (Recommended Metadata for Biological Images) represents a community-driven standard for biological imaging metadata that shares significant overlap with BioStride's goals for structural biology data management. Published in Nature Methods in 2021, REMBI provides guidelines for describing imaging experiments across light microscopy, electron microscopy, and X-ray microscopy—making it highly relevant to BioStride's multi-modal approach. This document analyzes the alignment between REMBI and BioStride, identifying opportunities for integration and mutual reinforcement of metadata standards in structural biology.
1. Introduction to REMBI
1.1 Background and Development
REMBI emerged from a critical need in the bioimaging community to standardize metadata for the vast amounts of imaging data being generated. Developed through a collaborative workshop in Hinxton (October 2019), REMBI brought together representatives from light, electron, and X-ray microscopy communities to establish common metadata guidelines.
Key Publication: Sarkans, U., Chiu, W., Collinson, L., et al. (2021). REMBI: Recommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nature Methods, 18, 1418–1422.
1.2 Core Objectives
REMBI aims to: - Enable systematic archiving of imaging data and metadata in public databases - Support FAIR (Findable, Accessible, Interoperable, Reusable) data principles - Facilitate automated data harvesting using machine learning techniques - Bridge diverse imaging communities with flexible yet comprehensive metadata standards
1.3 Adoption and Implementation
REMBI has been adopted by major imaging repositories: - BioImage Archive: Primary implementation platform - EMPIAR: Electron Microscopy Public Image Archive - Cell-IDR: Image Data Resource for cellular imaging - Tissue-IDR: Tissue-level imaging repository
2. REMBI's Eight-Component Structure
2.1 Hierarchical Organization
REMBI organizes metadata into eight high-level components that encompass diverse biological imaging methods:
REMBI_Structure:
1_Study:
description: "Top-level project metadata"
alignment: "Dublin Core, DataCite, schema.org"
2_Study_Component:
description: "Container for organizing related data"
example: "Separate components for EM and confocal in correlative studies"
3_Biosample:
description: "Biological sample metadata"
importance: "Critical for biological context"
4_Specimen:
description: "Specimen preparation details"
includes: "Fixation, staining, mounting"
5_Image_Acquisition:
description: "Microscopy technique and settings"
audience: "Imaging scientists and microscopists"
6_Image_Data:
description: "Image file metadata"
implementation: "File list with technical parameters"
7_Image_Correlation:
description: "Multi-modal image registration"
status: "Optional"
use_case: "Correlative microscopy"
8_Analyzed_Data:
description: "Image analysis and measurements"
status: "Optional"
format: "Typically tabular"
2.2 Target Audiences
REMBI addresses three primary user groups:
- Imaging Scientists: Need detailed acquisition parameters, physical properties of instruments
- Computer Vision Researchers: Require ground truth data, segmentations, labeled datasets
- Biologists: Focus on biological context, experimental conditions, sample preparation
3. Alignment with BioStride
3.1 Structural Mapping
BioStride and REMBI share substantial conceptual overlap, with complementary strengths:
| REMBI Component | BioStride Equivalent | Alignment Level | Notes |
|---|---|---|---|
| Study | Study | High | Direct conceptual match |
| Study Component | ExperimentRun | High | Both organize data collection sessions |
| Biosample | Sample | Very High | Core biological material description |
| Specimen | SamplePreparation | Very High | Preparation protocols and methods |
| Image Acquisition | Instrument + ExperimentRun | High | BioStride separates instrument from run |
| Image Data | Image (2D/3D) + DataFile | High | BioStride has specialized image classes |
| Image Correlation | Cross-modal relationships | Medium | BioStride handles via Study relationships |
| Analyzed Data | WorkflowRun + DataFile | High | Processing and derived data |
3.2 Complementary Strengths
REMBI Strengths for BioStride
- Comprehensive Imaging Focus: REMBI provides deeper metadata for microscopy-specific parameters
- Community Consensus: Broad acceptance across imaging communities
- Correlation Metadata: Explicit support for multi-modal image registration
- Machine Learning Ready: Designed with AI/ML applications in mind
BioStride Strengths for REMBI
- Structural Biology Depth: Specialized support for crystallography, SAXS/SANS, spectroscopy
- Workflow Provenance: Detailed computational processing tracking
- Multi-Scale Integration: From atomic to tissue-level organization
- LinkML Foundation: Semantic web compatibility and validation
3.3 Integration Opportunities
Integration_Strategy:
metadata_harmonization:
- Map REMBI fields to BioStride schema
- Extend BioStride with REMBI-specific attributes
- Create bidirectional converters
shared_vocabularies:
- Adopt REMBI controlled terms for microscopy
- Contribute structural biology terms to REMBI
- Align with common ontologies (UBERON, CL, etc.)
tool_ecosystem:
- Support REMBI export from BioStride
- Import REMBI-compliant data
- Validation across both standards
4. Specific Relevance to Structural Biology
4.1 Cryo-EM Applications
REMBI's adoption by EMPIAR creates direct relevance for structural biology:
Case Study: EMPIAR-10061 - Dataset: β-galactosidase raw cryo-EM data (12.4 TB) - Original resolution: 2.2 Å - Reuse examples: - Reprocessed to higher resolution by multiple groups - Used for algorithm development - Training data for deep learning particle picking - Cloud processing pipeline development
This demonstrates how REMBI metadata enables extensive data reuse in structural biology.
4.2 Correlative Microscopy
REMBI's Image Correlation component is particularly valuable for multi-modal structural biology:
Correlative_Example:
technique_1:
type: "Cryo-EM"
resolution: "3-5 Å"
provides: "Molecular structure"
technique_2:
type: "Fluorescence"
resolution: "200-300 nm"
provides: "Cellular context"
correlation_metadata:
transformation_matrix: "3x4 affine"
registration_error: "50 nm"
fiducial_markers: "Gold particles"
biostride_extension:
add_rembi_correlation: true
maintain_provenance: true
cross_validate: true
4.3 Light Microscopy Integration
Many structural biology studies now incorporate light microscopy for context:
- Live cell imaging before vitrification
- Fluorescence-guided lamella preparation
- CLEM (Correlative Light and Electron Microscopy)
- Super-resolution microscopy for validation
REMBI provides the metadata framework for these techniques that BioStride can leverage.
5. Implementation Recommendations
5.1 Short-Term Integration (0-6 months)
Metadata Mapping
Create explicit mappings between REMBI and BioStride:
class REMBIBioStrideMapper:
"""Bidirectional mapper between REMBI and BioStride schemas"""
def rembi_to_biostride(self, rembi_data):
biostride_study = {
'title': rembi_data['study']['title'],
'description': rembi_data['study']['description'],
'samples': self.map_biosamples(rembi_data['biosample']),
'sample_preparations': self.map_specimen(rembi_data['specimen']),
'instrument_runs': self.map_acquisition(rembi_data['image_acquisition']),
'images': self.map_image_data(rembi_data['image_data'])
}
# Handle optional correlation data
if 'image_correlation' in rembi_data:
biostride_study['cross_modal_registration'] = \
self.map_correlation(rembi_data['image_correlation'])
return biostride_study
def biostride_to_rembi(self, biostride_data):
# Inverse mapping for REMBI compliance
pass
Vocabulary Alignment
Harmonize controlled vocabularies:
Vocabulary_Mapping:
sample_types:
rembi: ["cell", "tissue", "organism", "molecule"]
biostride: ["protein", "nucleic_acid", "complex", "cell", "tissue"]
unified: ["protein", "nucleic_acid", "complex", "cell", "tissue", "organism", "small_molecule"]
preparation_methods:
rembi: ["fixation", "staining", "mounting"]
biostride: ["vitrification", "crystallization", "purification"]
unified: # Combine both sets with clear categories
imaging_techniques:
rembi: ["brightfield", "fluorescence", "confocal", "EM", "super-resolution"]
biostride: ["cryo_em", "x_ray", "saxs", "sans", "ftir", "xrf"]
unified: # Comprehensive technique ontology
5.2 Medium-Term Development (6-12 months)
Schema Extension
Extend BioStride to incorporate REMBI-specific fields:
# Addition to BioStride schema
ImageCorrelation:
is_a: AttributeGroup
description: "REMBI-compatible image correlation metadata"
attributes:
source_image:
range: Image
description: "Reference image for alignment"
target_image:
range: Image
description: "Image to be aligned"
transformation_type:
range: string
description: "Type of transformation (affine, deformable, etc.)"
transformation_parameters:
range: float
multivalued: true
description: "Transformation matrix or parameters"
registration_error:
range: float
description: "Registration accuracy in nm"
fiducial_markers:
range: string
description: "Type of fiducial markers used"
correlation_coefficient:
range: float
description: "Correlation quality metric"
Validation Framework
Implement cross-standard validation:
class CrossStandardValidator:
"""Validate data against both REMBI and BioStride"""
def validate(self, data):
results = {
'biostride_valid': self.validate_biostride(data),
'rembi_valid': self.validate_rembi(data),
'cross_compatible': self.check_compatibility(data),
'warnings': [],
'errors': []
}
# Check for conflicts between standards
if results['biostride_valid'] and not results['rembi_valid']:
results['warnings'].append(
"Data valid for BioStride but missing REMBI requirements"
)
return results
5.3 Long-Term Vision (1-2 years)
Unified Imaging Metadata Standard
Work toward a unified standard that combines strengths:
Future_Standard:
name: "Unified Biological Imaging Metadata (UBIM)"
foundation: "LinkML"
components:
- structural_biology: "From BioStride"
- light_microscopy: "From REMBI"
- image_correlation: "From REMBI"
- workflow_provenance: "From BioStride"
- ai_readiness: "From both"
governance:
- committee: "Joint REMBI-BioStride working group"
- stakeholders: ["wwPDB", "EMDB", "BioImage Archive", "IDR"]
- update_cycle: "Annual"
6. Practical Examples
6.1 Example: Cryo-CLEM Study
Combining cryo-EM with fluorescence microscopy:
Study:
title: "Mitochondrial ATP synthase in situ structure"
# REMBI-style biosample
biosample:
organism: "Homo sapiens"
cell_type: "HeLa"
organelle: "Mitochondria"
# REMBI-style specimen preparation
specimen:
fixation: "Vitrification"
temperature: "-180°C"
fluorescent_label: "MitoTracker Green"
# BioStride-style experiment runs
experiments:
- technique: "fluorescence_microscopy"
rembi_metadata:
excitation_wavelength: "488 nm"
emission_wavelength: "510 nm"
objective: "100x/1.4 NA"
- technique: "cryo_em"
voltage: "300 kV"
detector: "K3"
pixel_size: "0.83 Å"
# REMBI image correlation
correlation:
method: "Fiducial-based"
markers: "100 nm gold"
accuracy: "50 nm"
software: "ec-CLEM"
6.2 Example: Multi-Scale Plant Imaging
From organ to molecular level:
Study:
title: "Chloroplast structure under stress"
study_components:
- name: "Whole leaf imaging"
technique: "optical_microscopy"
rembi_compliance: true
scale: "mm"
- name: "Cellular imaging"
technique: "confocal_microscopy"
rembi_compliance: true
scale: "μm"
- name: "Chloroplast ultrastructure"
technique: "cryo_em_tomography"
biostride_compliance: true
scale: "nm"
- name: "Protein structure"
technique: "single_particle_cryoem"
biostride_compliance: true
scale: "Å"
cross_scale_correlation:
registration_method: "Hierarchical"
coordinate_system: "Unified 3D"
metadata_standard: "REMBI+BioStride"
7. Benefits of Integration
7.1 For the Structural Biology Community
- Broader Data Integration: Seamlessly combine molecular structures with cellular context
- Enhanced Discoverability: REMBI-compliant data more findable in imaging repositories
- Improved Reproducibility: Comprehensive metadata across scales
- AI/ML Readiness: Standardized metadata for training algorithms
7.2 For the Imaging Community
- Structural Context: Link cellular images to molecular structures
- Advanced Analytics: Leverage structural biology computational tools
- Multi-Modal Workflows: Streamlined correlative approaches
- Validation Frameworks: Benefit from structural biology's rigorous validation
7.3 For Data Repositories
- Interoperability: Exchange data between EMPIAR, BioImage Archive, and others
- Reduced Redundancy: Shared metadata models reduce duplication
- Unified Submission: Single pipeline for multi-modal studies
- Enhanced Services: Offer cross-repository search and analysis
8. Challenges and Mitigation
8.1 Technical Challenges
| Challenge | Mitigation Strategy |
|---|---|
| Schema complexity | Provide user-friendly tools and wizards |
| Vocabulary conflicts | Create mapping tables and conversion tools |
| Validation overhead | Implement automated validation pipelines |
| Storage requirements | Use hierarchical storage with metadata separation |
8.2 Community Challenges
| Challenge | Mitigation Strategy |
|---|---|
| Adoption resistance | Demonstrate clear benefits through case studies |
| Training needs | Develop comprehensive educational materials |
| Legacy data | Provide migration tools and services |
| Governance | Establish joint committees with clear mandates |
9. Implementation Roadmap
Phase 1: Foundation (Months 1-3)
- Establish working group with REMBI representatives
- Create detailed field mapping documentation
- Develop proof-of-concept converter tools
- Identify pilot projects for testing
Phase 2: Integration (Months 4-9)
- Implement bidirectional converters
- Extend BioStride schema with REMBI fields
- Develop validation frameworks
- Launch pilot projects with early adopters
Phase 3: Deployment (Months 10-12)
- Release integrated tools and documentation
- Train repository staff and users
- Process feedback and iterate
- Plan long-term governance structure
Phase 4: Maturation (Year 2+)
- Establish formal partnership with REMBI
- Contribute to standard evolution
- Expand adoption across communities
- Develop advanced integration features
10. Conclusions
10.1 Strategic Importance
The alignment between REMBI and BioStride represents a critical opportunity to bridge the imaging and structural biology communities. With REMBI's broad adoption in biological imaging and BioStride's comprehensive structural biology support, integration of these standards will enable unprecedented multi-scale, multi-modal studies.
10.2 Mutual Benefits
Both standards benefit from integration: - REMBI gains access to structural biology's sophisticated data models and validation frameworks - BioStride gains established imaging metadata standards and community adoption - The scientific community gains a more complete picture from molecules to organisms
10.3 Path Forward
The path to integration is clear and achievable: 1. Start with metadata mapping and vocabulary alignment 2. Build practical tools for data conversion 3. Demonstrate value through pilot projects 4. Formalize collaboration through governance structures 5. Work toward eventual standard convergence
10.4 Call to Action
We recommend: 1. Immediate: Contact REMBI maintainers to discuss collaboration 2. Short-term: Implement REMBI support in BioStride tools 3. Medium-term: Joint workshops and training materials 4. Long-term: Co-develop next-generation unified standards
The convergence of REMBI and BioStride will accelerate scientific discovery by enabling researchers to seamlessly navigate from cellular contexts to atomic details, ultimately advancing our understanding of biological systems across all scales.
References
-
Sarkans, U., et al. (2021). REMBI: Recommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nature Methods, 18, 1418–1422.
-
REMBI Overview. European Bioinformatics Institute. https://www.ebi.ac.uk/bioimage-archive/rembi-help-overview/
-
Iudin, A., et al. (2016). EMPIAR: a public archive for raw electron microscopy image data. Nature Methods, 13, 387–388.
-
Williams, E., et al. (2017). The Image Data Resource: A Bioimage Data Integration and Publication Platform. Nature Methods, 14, 775–781.
-
Hartley, M., et al. (2022). The BioImage Archive – Building a Home for Life-Sciences Microscopy Data. Journal of Molecular Biology, 434(11), 167505.