Genomics Standards Consortium

The Genomic Standards Consortium (GSC) is an open-membership working body formed in September 2005. The aim of the GSC is making genomic data discoverable. The GSC enables genomic data integration, discovery and comparison through international community-driven standards.

This project is maintained by cmungall

Genomic Standards Consortium



GSC defined terms

Below we list the complete set of terms available accross all checklists and environmental packages.

MIXS ID - MIXS:0000001

Term display name - amount or size of sample collected
Structured Comment name - samp_size
Definition - Amount or size of sample (volume, mass or area) that was collected
Expected value - measurement value
Value syntax - {float} {unit}
Example - 5 liter
Prefered Unit - millliter, gram, milligram, liter
Number of occurences permitted - 1

MIXS ID - MIXS:0000002

Term display name - sample collection device or method
Structured Comment name - samp_collect_device
Definition - The method or device employed for collecting the sample
Expected value - type name
Value syntax - {text}
Example - biopsy, niskin bottle, push core
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000003

Term display name - isolation and growth condition
Structured Comment name - isol_growth_condt
Definition - Publication reference in the form of pubmed ID (pmid), digital object identifier (doi) or url for isolation and growth condition specifications of the organism/material
Expected value - PMID,DOI or URL
Value syntax - {PMID}|{DOI}|{URL}
Example - doi: 10.1016/j.syapm.2018.01.009
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000004

Term display name - submitted to insdc
Structured Comment name - submitted_to_insdc
Definition - Depending on the study (large-scale e.g. done with next generation sequencing technology, or small-scale) sequences have to be submitted to SRA (Sequence Read Archive), DRA (DDBJ Read Archive) or via the classical Webin/Sequin systems to Genbank, ENA and DDBJ. Although this field is mandatory, it is meant as a self-test field, therefore it is not necessary to include this field in contextual data submitted to databases
Expected value - boolean
Value syntax - {boolean}
Example - yes
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000005

Term display name - contamination screening input
Structured Comment name - contam_screen_input
Definition - The type of sequence data used as input
Expected value - enumeration
Value syntax - [reads| contigs]
Example - contigs
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000006

Term display name - WGA amplification kit
Structured Comment name - wga_amp_kit
Definition - Kit used to amplify genomic DNA in preparation for sequencing
Expected value - kit name
Value syntax - {text}
Example - qiagen repli-g
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000007

Term display name - investigation type
Structured Comment name - investigation_type
Definition - Nucleic Acid Sequence Report is the root element of all MIGS/MIMS compliant reports as standardized by Genomic Standards Consortium. This field is either eukaryote,bacteria,virus,plasmid,organelle, metagenome,mimarks-survey, mimarks-specimen, metatranscriptome, single amplified genome, metagenome-assembled genome, or uncultivated viral genome
Expected value - eukaryote, bacteria_archaea, plasmid, virus, organelle, metagenome,mimarks-survey, mimarks-specimen, metatranscriptome, single amplified genome, metagenome-assembled genome, or uncultivated viral genomes
Value syntax - [eukaryote|bacteria_archaea|plasmid|virus|organelle|metagenome|metatranscriptome|mimarks-survey|mimarks-specimen|misag|mimag|miuvig]
Example - metagenome
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000008

Term display name - experimental factor
Structured Comment name - experimental_factor
Definition - Experimental factors are essentially the variable aspects of an experiment design which can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI). For a browser of EFO (v 2.95) terms, please see http://purl.bioontology.org/ontology/EFO; for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI
Expected value - text or EFO and/or OBI
Value syntax - {termLabel} {[termID]}|{text}
Example - time series design [EFO:EFO_0001779]
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000009

Term display name - geographic location (latitude and longitude)
Structured Comment name - lat_lon
Definition - The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system
Expected value - decimal degrees
Value syntax - {float} {float}
Example - 50.586825 6.408977
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000010

Term display name - geographic location (country and/or sea,region)
Structured Comment name - geo_loc_name
Definition - The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (v 1.512) (http://purl.bioontology.org/ontology/GAZ)
Expected value - country or sea name (INSDC or GAZ);region(GAZ);specific location name
Value syntax - {term};{term};{text}
Example - Germany;North Rhine-Westphalia;Eifel National Park
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000011

Term display name - collection date
Structured Comment name - collection_date
Definition - The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant
Expected value - date and time
Value syntax - {timestamp}
Example - 2018-05-11T10:00:00+01:00
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000012

Term display name - broad-scale environmental context
Structured Comment name - env_broad_scale
Definition - In this field, report which major environmental system your sample or specimen came from. The systems identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. were you in the desert or a rainforest?). We recommend using subclasses of ENVO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. Format (one term): termLabel [termID], Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a water sample from the photic zone in middle of the Atlantic Ocean, consider: oceanic epipelagic zone biome [ENVO:01000033]. Example: Annotating a sample from the Amazon rainforest consider: tropical moist broadleaf forest biome [ENVO:01000228]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
Expected value - Add terms that identify the major environment type(s) where your sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes e.g.: mangrove biome [ENVO:01000181]|estuarine biome [ENVO:01000020]
Value syntax - {termLabel} {[termID]}
Example - forest biome [ENVO:01000174]
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000013

Term display name - local environmental context
Structured Comment name - env_local_scale
Definition - In this field, report the entity or entities which are in your sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
Expected value - Add terms that identify environmental entities having causal influences upon the entity at time of sampling, multiple terms can be separated by pipes, e.g.: shoreline [ENVO:00000486]|intertidal zone [ENVO:00000316]
Value syntax - {termLabel} {[termID]}
Example - litter layer [ENVO:01000338]
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000014

Term display name - environmental medium
Structured Comment name - env_medium
Definition - In this field, report which environmental material or materials (pipe separated) immediately surrounded your sample or specimen prior to sampling, using one or more subclasses of ENVO’s environmental material class: http://purl.obolibrary.org/obo/ENVO_00010483. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air ENVO_00002005. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
Expected value - Add terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. Multiple terms can be separated by pipes e.g.: estuarine water [ENVO:01000301]|estuarine mud [ENVO:00002160]
Value syntax - {termLabel} {[termID]}
Example - soil [ENVO:00001998]
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000015

Term display name - relationship to oxygen
Structured Comment name - rel_to_oxygen
Definition - Is this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments
Expected value - enumeration
Value syntax - [aerobe|anaerobe|facultative|microaerophilic|microanaerobe|obligate aerobe|obligate anaerobe]
Example - aerobe
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000016

Term display name - sample material processing
Structured Comment name - samp_mat_process
Definition - Any processing applied to the sample during or after retrieving the sample from environment. This field accepts OBI, for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI
Expected value - text or OBI
Value syntax - {text}|{termLabel} {[termID]}
Example - filtering of seawater, storing samples in ethanol
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000017

Term display name - size fraction selected
Structured Comment name - size_frac
Definition - Filtering pore size used in sample preparation
Expected value - filter size value range
Value syntax - {float}-{float} {unit}
Example - 0-0.22 micrometer
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000018

Term display name - geographic location (depth)
Structured Comment name - depth
Definition - Please refer to the definitions of depth in the environmental packages
Expected value - -
Value syntax - -
Example - 0
Prefered Unit -
Number of occurences permitted - 0

MIXS ID - MIXS:0000019

Term display name - environmental package
Structured Comment name - env_package
Definition - MIxS extension for reporting of measurements and observations obtained from one or more of the environments where the sample was obtained. All environmental packages listed here are further defined in separate subtables. By giving the name of the environmental package, a selection of fields can be made from the subtables and can be reported
Expected value - enumeration
Value syntax - [air|built environment|host-associated|human-associated|human-skin|human-oral|human-gut|human-vaginal|hydrocarbon resources-cores|hydrocarbon resources-fluids/swabs|microbial mat/biofilm|misc environment|plant-associated|sediment|soil|wastewater/sludge|water]
Example - soil
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000020

Term display name - subspecific genetic lineage
Structured Comment name - subspecf_gen_lin
Definition - This should provide further information about the genetic distinctness of the sequenced organism by recording additional information e.g. serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. It can also contain alternative taxonomic information. It should contain both the lineage name, and the lineage rank, i.e. biovar:abc123
Expected value - genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype
Value syntax - {rank name}:{text}
Example - serovar:Newport
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000021

Term display name - ploidy
Structured Comment name - ploidy
Definition - The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO
Expected value - PATO
Value syntax - {termLabel} {[termID]}
Example - allopolyploidy [PATO:0001379]
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000022

Term display name - number of replicons
Structured Comment name - num_replicons
Definition - Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote
Expected value - for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments
Value syntax - {integer}
Example - 2
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000023

Term display name - extrachromosomal elements
Structured Comment name - extrachrom_elements
Definition - Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)
Expected value - number of extrachromosmal elements
Value syntax - {integer}
Example - 5
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000024

Term display name - estimated size
Structured Comment name - estimated_size
Definition - The estimated size of the genome prior to sequencing. Of particular importance in the sequencing of (eukaryotic) genome which could remain in draft form for a long or unspecified period.
Expected value - number of base pairs
Value syntax - {integer} bp
Example - 300000 bp
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000025

Term display name - reference for biomaterial
Structured Comment name - ref_biomaterial
Definition - Primary publication if isolated before genome publication; otherwise, primary genome report
Expected value - PMID, DOI or URL
Value syntax - {PMID}|{DOI}|{URL}
Example - doi:10.1016/j.syapm.2018.01.009
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000026

Term display name - source material identifiers
Structured Comment name - source_mat_id
Definition - A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain ‘UAM:Herps:14’ , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).
Expected value - for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer
Value syntax - {text}
Example - MPI012345
Prefered Unit -
Number of occurences permitted - m

MIXS ID - MIXS:0000027

Term display name - known pathogenicity
Structured Comment name - pathogenicity
Definition - To what is the entity pathogenic
Expected value - names of organisms that the entity is pathogenic to
Value syntax - {text}
Example - human, animal, plant, fungi, bacteria
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000028

Term display name - observed biotic relationship
Structured Comment name - biotic_relationship
Definition - Description of relationship(s) between the subject organism and other organism(s) it is associated with. E.g., parasite on species X; mutualist with species Y. The target organism is the subject of the relationship, and the other organism(s) is the object
Expected value - enumeration
Value syntax - [free living|parasitism|commensalism|symbiotic|mutualism]
Example - free living
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000029

Term display name - specific host
Structured Comment name - specific_host
Definition - If there is a host involved, please provide its taxid (or environmental if not actually isolated from the dead or alive host - i.e. a pathogen could be isolated from a swipe of a bench etc) and report whether it is a laboratory or natural host)
Expected value - host taxid, unknown, environmental
Value syntax - {NCBI taxid}|{text}
Example - 9606
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000030

Term display name - host specificity or range
Structured Comment name - host_spec_range
Definition - The NCBI taxonomy identifier of the specific host if it is known
Expected value - NCBI taxid
Value syntax - {integer}
Example - 9606
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000031

Term display name - health or disease status of specific host at time of collection
Structured Comment name - health_disease_stat
Definition - Health or disease status of specific host at time of collection
Expected value - enumeration
Value syntax - [healthy|diseased|dead|disease-free|undetermined|recovering|resolving|pre-existing condition|pathological|life threatening|congenital]
Example - dead
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000032

Term display name - trophic level
Structured Comment name - trophic_level
Definition - Trophic levels are the feeding position in a food chain. Microbes can be a range of producers (e.g. chemolithotroph)
Expected value - enumeration
Value syntax - [autotroph|carboxydotroph|chemoautotroph|chemoheterotroph|chemolithoautotroph|chemolithotroph|chemoorganoheterotroph|chemoorganotroph|chemosynthetic|chemotroph|copiotroph|diazotroph|facultative|autotroph|heterotroph|lithoautotroph|lithoheterotroph|lithotroph|methanotroph|methylotroph|mixotroph|obligate|chemoautolithotroph|oligotroph|organoheterotroph|organotroph|photoautotroph|photoheterotroph|photolithoautotroph|photolithotroph|photosynthetic|phototroph]
Example - heterotroph
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000033

Term display name - propagation
Structured Comment name - propagation
Definition - This field is specific to different taxa. For phages: lytic/lysogenic, for plasmids: incompatibility group, for eukaryotes: sexual/asexual (Note: there is the strong opinion to name phage propagation obligately lytic or temperate, therefore we also give this choice
Expected value - for virus: lytic, lysogenic, temperate, obligately lytic; for plasmid: incompatibility group; for eukaryote: asexual, sexual
Value syntax - {text}
Example - lytic
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000034

Term display name - encoded traits
Structured Comment name - encoded_traits
Definition - Should include key traits like antibiotic resistance or xenobiotic degradation phenotypes for plasmids, converting genes for phage
Expected value - for plasmid: antibiotic resistance; for phage: converting genes
Value syntax - {text}
Example - beta-lactamase class A
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000035

Term display name - source of UViGs
Structured Comment name - source_uvig
Definition - Type of dataset from which the UViG was obtained
Expected value - enumeration
Value syntax - [metagenome (not viral targeted)|viral fraction metagenome (virome)|sequence-targeted metagenome|metatranscriptome (not viral targeted)|viral fraction RNA metagenome (RNA virome)|sequence-targeted RNA metagenome|microbial single amplified genome (SAG)|viral single amplified genome (vSAG)|isolate microbial genome|other]
Example - viral fraction metagenome (virome)
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000036

Term display name - virus enrichment approach
Structured Comment name - virus_enrich_appr
Definition - List of approaches used to enrich the sample for viruses, if any
Expected value - enumeration
Value syntax - [filtration|ultrafiltration|centrifugation|ultracentrifugation|PEG Precipitation|FeCl Precipitation|CsCl density gradient|DNAse|RNAse|targeted sequence capture|other|none]
Example - filtration + FeCl Precipitation + ultracentrifugation + DNAse
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000037

Term display name - nucleic acid extraction
Structured Comment name - nucl_acid_ext
Definition - A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample
Expected value - PMID, DOI or URL
Value syntax - {PMID}|{DOI}|{URL}
Example - https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000038

Term display name - nucleic acid amplification
Structured Comment name - nucl_acid_amp
Definition - A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids
Expected value - PMID, DOI or URL
Value syntax - {PMID}|{DOI}|{URL}
Example - https://phylogenomics.me/protocols/16s-pcr-protocol/
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000039

Term display name - library size
Structured Comment name - lib_size
Definition - Total number of clones in the library prepared for the project
Expected value - number of clones
Value syntax - {integer}
Example - 50
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000040

Term display name - library reads sequenced
Structured Comment name - lib_reads_seqd
Definition - Total number of clones sequenced from the library
Expected value - number of reads sequenced
Value syntax - {integer}
Example - 20
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000041

Term display name - library layout
Structured Comment name - lib_layout
Definition - Specify whether to expect single, paired, or other configuration of reads
Expected value - enumeration
Value syntax - [paired|single|vector|other]
Example - paired
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000042

Term display name - library vector
Structured Comment name - lib_vector
Definition - Cloning vector type(s) used in construction of libraries
Expected value - vector
Value syntax - {text}
Example - Bacteriophage P1
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000043

Term display name - library screening strategy
Structured Comment name - lib_screen
Definition - Specific enrichment or screening methods applied before and/or after creating libraries
Expected value - screening strategy name
Value syntax - {text}
Example - enriched, screened, normalized
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000044

Term display name - target gene
Structured Comment name - target_gene
Definition - Targeted gene or locus name for marker gene studies
Expected value - gene name
Value syntax - {text}
Example - 16S rRNA, 18S rRNA, nif, amoA, rpo
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000045

Term display name - target subfragment
Structured Comment name - target_subfragment
Definition - Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA
Expected value - gene fragment name
Value syntax - {text}
Example - V6, V9, ITS
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000046

Term display name - pcr primers
Structured Comment name - pcr_primers
Definition - PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters
Expected value - FWD: forward primer sequence;REV:reverse primer sequence
Value syntax - FWD:{dna};REV:{dna}
Example - FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000047

Term display name - multiplex identifiers
Structured Comment name - mid
Definition - Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters
Expected value - multiplex identifier sequence
Value syntax - {dna}
Example - GTGAATAT
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000048

Term display name - adapters
Structured Comment name - adapters
Definition - Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters
Expected value - adapter A and B sequence
Value syntax - {dna};{dna}
Example - AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000049

Term display name - pcr conditions
Structured Comment name - pcr_cond
Definition - Description of reaction conditions and components of PCR in the form of ‘initial denaturation:94degC_1.5min; annealing=…’
Expected value - initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles
Value syntax - initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles
Example - initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000050

Term display name - sequencing method
Structured Comment name - seq_meth
Definition - Sequencing method used; e.g. Sanger, pyrosequencing, ABI-solid
Expected value - enumeration
Value syntax - [MinION|GridION|PromethION|454 GS|454 GS 20|454 GS FLX|454 GS FLX+|454 GS FLX Titanium|454 GS Junior|Illumina Genome Analyzer|Illumina Genome Analyzer II|Illumina Genome Analyzer IIx|Illumina HiSeq 4000|Illumina HiSeq 3000|Illumina HiSeq 2500|Illumina HiSeq 2000|Illumina HiSeq 1500|Illumina HiSeq 1000|Illumina HiScanSQ|Illumina MiSeq|Illumina HiSeq X Five|Illumina HiSeq X Ten|Illumina NextSeq 500|Illumina NextSeq 550|AB SOLiD System|AB SOLiD System 2.0|AB SOLiD System 3.0|AB SOLiD 3 Plus System|AB SOLiD 4 System|AB SOLiD 4hq System|AB SOLiD PI System|AB 5500 Genetic Analyzer|AB 5500xl Genetic Analyzer|AB 5500xl-W Genetic Analysis System|Ion Torrent PGM|Ion Torrent Proton|Ion Torrent S5|Ion Torrent S5 XL|PacBio RS|PacBio RS II|Sequel|AB 3730xL Genetic Analyzer|AB 3730 Genetic Analyzer|AB 3500xL Genetic Analyzer|AB 3500 Genetic Analyzer|AB 3130xL Genetic Analyzer|AB 3130 Genetic Analyzer|AB 310 Genetic Analyzer|BGISEQ-500]
Example - Illumina HiSeq 1500
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000051

Term display name - sequence quality check
Structured Comment name - seq_quality_check
Definition - Indicate if the sequence has been called by automatic systems (none) or undergone a manual editing procedure (e.g. by inspecting the raw data or chromatograms). Applied only for sequences that are not submitted to SRA,ENA or DRA
Expected value - none or manually edited
Value syntax - [none|manually edited]
Example - none
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000052

Term display name - chimera check
Structured Comment name - chimera_check
Definition - A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point
Expected value - name and version of software, parameters used
Value syntax - {software};{version};{parameters}
Example - uchime;v4.1;default parameters
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000053

Term display name - taxonomic identity marker
Structured Comment name - tax_ident
Definition - The phylogenetic marker(s) used to assign an organism name to the SAG or MAG
Expected value - enumeration
Value syntax - [16S rRNA gene|multi-marker approach|other]
Example - other: rpoB gene
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000054

Term display name - single cell or viral particle lysis kit protocol
Structured Comment name - single_cell_lysis_prot
Definition - Name of the kit or standard protocol used for cell(s) or particle(s) lysis
Expected value - kit, protocol name
Value syntax - {text}
Example - ambion single cell lysis kit
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000055

Term display name - WGA amplification approach
Structured Comment name - wga_amp_appr
Definition - Method used to amplify genomic DNA in preparation for sequencing
Expected value - enumeration
Value syntax - [pcr based|mda based]
Example - mda based
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000056

Term display name - assembly quality
Structured Comment name - assembly_qual
Definition - The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
Expected value - enumeration
Value syntax - [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)]
Example - High-quality draft genome
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000057

Term display name - assembly name
Structured Comment name - assembly_name
Definition - Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community
Expected value - name and version of assembly
Value syntax - {text} {text}
Example - HuRef, JCVI_ISG_i3_1.0
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000058

Term display name - assembly software
Structured Comment name - assembly_software
Definition - Tool(s) used for assembly, including version number and parameters
Expected value - name and version of software, parameters used
Value syntax - {software};{version};{parameters}
Example - metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise
Prefered Unit -
Number of occurences permitted - m

MIXS ID - MIXS:0000059

Term display name - annotation
Structured Comment name - annot
Definition - Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter
Expected value - name of tool or pipeline used, or annotation source description
Value syntax - {text}
Example - prokka
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000060

Term display name - number of contigs
Structured Comment name - number_contig
Definition - Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG
Expected value - value
Value syntax - {integer}
Example - 40
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000061

Term display name - feature prediction
Structured Comment name - feat_pred
Definition - Method used to predict UViGs features such as ORFs, integration site, etc.
Expected value - names and versions of software(s), parameters used
Value syntax - {software};{version};{parameters}
Example - Prodigal;2.6.3;default parameters
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000062

Term display name - reference database(s)
Structured Comment name - ref_db
Definition - List of database(s) used for ORF annotation, along with version number and reference to website or publication
Expected value - names, versions, and references of databases
Value syntax - {database};{version};{reference}
Example - pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000063

Term display name - similarity search method
Structured Comment name - sim_search_meth
Definition - Tool used to compare ORFs with database, along with version and cutoffs used
Expected value - names and versions of software(s), parameters used
Value syntax - {software};{version};{parameters}
Example - HMMER3;3.1b2;hmmsearch, cutoff of 50 on score
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000064

Term display name - taxonomic classification
Structured Comment name - tax_class
Definition - Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes
Expected value - classification method, database name, and other parameters
Value syntax - {text}
Example - vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters)
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000065

Term display name - 16S recovered
Structured Comment name - 16s_recover
Definition - Can a 16S gene be recovered from the submitted SAG or MAG?
Expected value - boolean
Value syntax - {boolean}
Example - yes
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000066

Term display name - 16S recovery software
Structured Comment name - 16s_recover_software
Definition - Tools used for 16S rRNA gene extraction
Expected value - names and versions of software(s), parameters used
Value syntax - {software};{version};{parameters}
Example - rambl;v2;default parameters
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000067

Term display name - number of standard tRNAs extracted
Structured Comment name - trnas
Definition - The total number of tRNAs identified from the SAG or MAG
Expected value - value from 0-21
Value syntax - {integer}
Example - 18
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000068

Term display name - tRNA extraction software
Structured Comment name - trna_ext_software
Definition - Tools used for tRNA identification
Expected value - names and versions of software(s), parameters used
Value syntax - {software};{version};{parameters}
Example - infernal;v2;default parameters
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000069

Term display name - completeness score
Structured Comment name - compl_score
Definition - Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores
Expected value - quality;percent completeness
Value syntax - [high|med|low];{percentage}
Example - med;60%
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000070

Term display name - completeness software
Structured Comment name - compl_software
Definition - Tools used for completion estimate, i.e. checkm, anvi’o, busco
Expected value - names and versions of software(s) used
Value syntax - {software};{version}
Example - checkm
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000071

Term display name - completeness approach
Structured Comment name - compl_appr
Definition - The approach used to determine the completeness of a given SAG or MAG, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome
Expected value - enumeration
Value syntax - [marker gene|reference based|other]
Example - other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83)
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000072

Term display name - contamination score
Structured Comment name - contam_score
Definition - The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases
Expected value - value
Value syntax - {float} percentage
Example - 0.01
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000073

Term display name - contamination screening parameters
Structured Comment name - contam_screen_param
Definition - Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer
Expected value - enumeration;value or name
Value syntax - [ref db|kmer|coverage|combination];{text|integer}
Example - kmer
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000074

Term display name - decontamination software
Structured Comment name - decontam_software
Definition - Tool(s) used in contamination screening
Expected value - enumeration
Value syntax - [checkm/refinem|anvi’o|prodege|bbtools:decontaminate.sh|acdc|combination]
Example - anvi’o
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000075

Term display name - sorting technology
Structured Comment name - sort_tech
Definition - Method used to sort/isolate cells or particles of interest
Expected value - enumeration
Value syntax - [flow cytometric cell sorting|microfluidics|lazer-tweezing|optical manipulation|micromanipulation|other]
Example - optical manipulation
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000076

Term display name - single cell or viral particle lysis approach
Structured Comment name - single_cell_lysis_appr
Definition - Method used to free DNA from interior of the cell(s) or particle(s)
Expected value - enumeration
Value syntax - [chemical|enzymatic|physical|combination]
Example - enzymatic
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000077

Term display name - binning parameters
Structured Comment name - bin_param
Definition - The parameters that have been applied during the extraction of genomes from metagenomic datasets
Expected value - enumeration
Value syntax - [homology search|kmer|coverage|codon usage|combination]
Example - coverage and kmer
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000078

Term display name - binning software
Structured Comment name - bin_software
Definition - Tool(s) used for the extraction of genomes from metagenomic datasets
Expected value - enumeration
Value syntax - [metabat|maxbin|concoct|groupm|esom|metawatt|combination|other]
Example - concoct and maxbin
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000079

Term display name - reassembly post binning
Structured Comment name - reassembly_bin
Definition - Has an assembly been performed on a genome bin extracted from a metagenomic assembly?
Expected value - boolean
Value syntax - {boolean}
Example - no
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000080

Term display name - MAG coverage software
Structured Comment name - mag_cov_software
Definition - Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets
Expected value - enumeration
Value syntax - [bwa|bbmap|bowtie|other]
Example - bbmap
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000081

Term display name - viral identification software
Structured Comment name - vir_ident_software
Definition - Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used
Expected value - software name, version and relevant parameters
Value syntax - {software};{version};{parameters}
Example - VirSorter; 1.0.4; Virome database, category 2
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000082

Term display name - predicted genome type
Structured Comment name - pred_genome_type
Definition - Type of genome predicted for the UViG
Expected value - enumeration
Value syntax - [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized]
Example - dsDNA
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000083

Term display name - predicted genome structure
Structured Comment name - pred_genome_struc
Definition - Expected structure of the viral genome
Expected value - enumeration
Value syntax - [segmented|non-segmented|undetermined]
Example - non-segmented
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000084

Term display name - detection type
Structured Comment name - detec_type
Definition - Type of UViG detection
Expected value - enumeration
Value syntax - [independent sequence (UViG)|provirus (UpViG)]
Example - independent sequence (UViG)
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000085

Term display name - vOTU classification approach
Structured Comment name - votu_class_appr
Definition - Cutoffs and approach used when clustering new UViGs in “species-level” vOTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside vOTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis
Expected value - cutoffs and method used
Value syntax - {ANI cutoff};{AF cutoff};{clustering method}
Example - 95% ANI;85% AF; greedy incremental clustering
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000086

Term display name - vOTU sequence comparison approach
Structured Comment name - votu_seq_comp_appr
Definition - Tool and thresholds used to compare sequences when computing “species-level” vOTUs
Expected value - software name, version and relevant parameters
Value syntax - {software};{version};{parameters}
Example - blastn;2.6.0+;e-value cutoff: 0.001
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000087

Term display name - vOTU database
Structured Comment name - votu_db
Definition - Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in “species-level” vOTUs, if any
Expected value - database and version
Value syntax - {database};{version}
Example - NCBI Viral RefSeq;83
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000088

Term display name - host prediction approach
Structured Comment name - host_pred_appr
Definition - Tool or approach used for host prediction
Expected value - enumeration
Value syntax - [provirus|host sequence similarity|CRISPR spacer match|kmer similarity|co-occurrence|combination|other]
Example - CRISPR spacer match
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000089

Term display name - host prediction estimated accuracy
Structured Comment name - host_pred_est_acc
Definition - For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature
Expected value - false discovery rate
Value syntax - {text}
Example - CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048)
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000090

Term display name - relevant standard operating procedures
Structured Comment name - sop
Definition - Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
Expected value - reference to SOP
Value syntax - {PMID}|{DOI}|{URL}
Example - http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/
Prefered Unit -
Number of occurences permitted - m

MIXS ID - MIXS:0000091

Term display name - relevant electronic resources
Structured Comment name - url
Definition - 0
Expected value - URL
Value syntax - {URL}
Example - http://www.earthmicrobiome.org/
Prefered Unit -
Number of occurences permitted - m

MIXS ID - MIXS:0000092

Term display name - project name
Structured Comment name - project_name
Definition - Name of the project within which the sequencing was organized
Expected value - 0
Value syntax - {text}
Example - Forest soil metagenome
Prefered Unit -
Number of occurences permitted - 1

MIXS ID - MIXS:0000093

Term display name - elevation
Structured Comment name - elev
Definition - Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth’s surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit
Expected value - measurement value
Value syntax - {float} {unit}
Example - 100 meter
Prefered Unit -
Number of occurences permitted - 0

MIXS ID - MIXS:0000094

Term display name - altitude
Structured Comment name - alt
Definition - Altitude is a term used to identify heights of objects such as airplanes, space shuttles, rockets, atmospheric balloons and heights of places such as atmospheric layers and clouds. It is used to measure the height of an object which is above the earth’s surface. In this context, the altitude measurement is the vertical distance between the earth’s surface above sea level and the sampled position in the air
Expected value - measurement value
Value syntax - {float} {unit}
Example - 100 meter
Prefered Unit -
Number of occurences permitted - 0