Skip to content

Class: Core

core package

URI: mixs.vocab:Core

img

Attributes

Own

  • adapters OPT
    • Description: Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters
    • range: String
    • Example: AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT None
  • alt OPT
    • Description: Altitude is a term used to identify heights of objects such as airplanes, space shuttles, rockets, atmospheric balloons and heights of places such as atmospheric layers and clouds. It is used to measure the height of an object which is above the earth‚Äôs surface. In this context, the altitude measurement is the vertical distance between the earth's surface above sea level and the sampled position in the air
    • range: String
    • Example: 100 meter None
  • annot OPT
    • Description: Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter
    • range: String
    • Example: prokka None
  • assembly_name OPT
    • Description: Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community
    • range: String
    • Example: HuRef, JCVI_ISG_i3_1.0 None
  • assembly_qual OPT
    • Description: The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
    • range: String
    • Example: High-quality draft genome None
  • assembly_software OPT
    • Description: Tool(s) used for assembly, including version number and parameters
    • range: String
    • Example: metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise None
  • bin_param OPT
    • Description: The parameters that have been applied during the extraction of genomes from metagenomic datasets
    • range: String
    • Example: coverage and kmer None
  • bin_software OPT
    • Description: Tool(s) used for the extraction of genomes from metagenomic datasets
    • range: String
    • Example: concoct and maxbin None
  • biotic_relationship OPT
    • Description: Description of relationship(s) between the subject organism and other organism(s) it is associated with. E.g., parasite on species X; mutualist with species Y. The target organism is the subject of the relationship, and the other organism(s) is the object
    • range: String
    • Example: free living None
  • chimera_check OPT
    • Description: A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point
    • range: String
    • Example: uchime;v4.1;default parameters None
  • collection_date OPT
    • Description: The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant
    • range: String
    • Example: 2018-05-11T10:00:00+01:00 None
  • compl_appr OPT
    • Description: The approach used to determine the completeness of a given SAG or MAG, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome
    • range: String
    • Example: other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83) None
  • compl_score OPT
    • Description: Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores
    • range: String
    • Example: med;60% None
  • compl_software OPT
    • Description: Tools used for completion estimate, i.e. checkm, anvi'o, busco
    • range: String
    • Example: checkm None
  • contam_score OPT
    • Description: The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases
    • range: String
    • Example: 1% None
  • contam_screen_input OPT
    • Description: The type of sequence data used as input
    • range: String
    • Example: contigs None
  • contam_screen_param OPT
    • Description: Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer
    • range: String
    • Example: kmer None
  • decontam_software OPT
    • Description: Tool(s) used in contamination screening
    • range: String
    • Example: anvi'o None
  • depth OPT
    • Description: Please refer to the definitions of depth in the environmental packages
    • range: String
    • Example: None
  • detec_type OPT
    • Description: Type of UViG detection
    • range: String
    • Example: independent sequence (UViG) None
  • elev OPT
    • Description: Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit
    • range: String
    • Example: 100 meter None
  • encoded_traits OPT
    • Description: Should include key traits like antibiotic resistance or xenobiotic degradation phenotypes for plasmids, converting genes for phage
    • range: String
    • Example: beta-lactamase class A None
  • env_broad_scale OPT
    • Description: In this field, report which major environmental system your sample or specimen came from. The systems identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. were you in the desert or a rainforest?). We recommend using subclasses of ENVO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. Format (one term): termLabel [termID], Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a water sample from the photic zone in middle of the Atlantic Ocean, consider: oceanic epipelagic zone biome [ENVO:01000033]. Example: Annotating a sample from the Amazon rainforest consider: tropical moist broadleaf forest biome [ENVO:01000228]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
    • range: String
    • Example: forest biome [ENVO:01000174] None
  • env_local_scale OPT
    • Description: In this field, report the entity or entities which are in your sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
    • range: String
    • Example: litter layer [ENVO:01000338] None
  • env_medium OPT
    • Description: In this field, report which environmental material or materials (pipe separated) immediately surrounded your sample or specimen prior to sampling, using one or more subclasses of ENVO’s environmental material class: http://purl.obolibrary.org/obo/ENVO_00010483. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]|termLabel [termID]|termLabel [termID]. Example: Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air ENVO_00002005. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html
    • range: String
    • Example: soil [ENVO:00001998] None
  • env_package OPT
    • Description: MIxS extension for reporting of measurements and observations obtained from one or more of the environments where the sample was obtained. All environmental packages listed here are further defined in separate subtables. By giving the name of the environmental package, a selection of fields can be made from the subtables and can be reported
    • range: String
    • Example: soil None
  • estimated_size OPT
    • Description: The estimated size of the genome prior to sequencing. Of particular importance in the sequencing of (eukaryotic) genome which could remain in draft form for a long or unspecified period.
    • range: String
    • Example: 300000 bp None
  • experimental_factor OPT
    • Description: Experimental factors are essentially the variable aspects of an experiment design which can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI). For a browser of EFO (v 2.95) terms, please see http://purl.bioontology.org/ontology/EFO; for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI
    • range: String
    • Example: time series design [EFO:EFO_0001779] None
  • extrachrom_elements OPT
    • Description: Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)
    • range: String
    • Example: 5 None
  • feat_pred OPT
    • Description: Method used to predict UViGs features such as ORFs, integration site, etc.
    • range: String
    • Example: Prodigal;2.6.3;default parameters None
  • geo_loc_name OPT
    • Description: The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (v 1.512) (http://purl.bioontology.org/ontology/GAZ)
    • range: String
    • Example: Germany;North Rhine-Westphalia;Eifel National Park None
  • health_disease_stat OPT
    • Description: Health or disease status of specific host at time of collection
    • range: String
    • Example: dead None
  • host_pred_appr OPT
    • Description: Tool or approach used for host prediction
    • range: String
    • Example: CRISPR spacer match None
  • host_pred_est_acc OPT
    • Description: For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature
    • range: String
    • Example: CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048) None
  • host_spec_range OPT
    • Description: The NCBI taxonomy identifier of the specific host if it is known
    • range: String
    • Example: 9606 None
  • investigation_type OPT
    • Description: Nucleic Acid Sequence Report is the root element of all MIGS/MIMS compliant reports as standardized by Genomic Standards Consortium. This field is either eukaryote,bacteria,virus,plasmid,organelle, metagenome,mimarks-survey, mimarks-specimen, metatranscriptome, single amplified genome, metagenome-assembled genome, or uncultivated viral genome
    • range: String
    • Example: metagenome None
  • isol_growth_condt OPT
    • Description: Publication reference in the form of pubmed ID (pmid), digital object identifier (doi) or url for isolation and growth condition specifications of the organism/material
    • range: String
    • Example: doi: 10.1016/j.syapm.2018.01.009 None
  • lat_lon OPT
    • Description: The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system
    • range: String
    • Example: 50.586825 6.408977 None
  • lib_layout OPT
    • Description: Specify whether to expect single, paired, or other configuration of reads
    • range: String
    • Example: paired None
  • lib_reads_seqd OPT
    • Description: Total number of clones sequenced from the library
    • range: String
    • Example: 20 None
  • lib_screen OPT
    • Description: Specific enrichment or screening methods applied before and/or after creating libraries
    • range: String
    • Example: enriched, screened, normalized None
  • lib_size OPT
    • Description: Total number of clones in the library prepared for the project
    • range: String
    • Example: 50 None
  • lib_vector OPT
    • Description: Cloning vector type(s) used in construction of libraries
    • range: String
    • Example: Bacteriophage P1 None
  • mag_cov_software OPT
    • Description: Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets
    • range: String
    • Example: bbmap None
  • mid OPT
    • Description: Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters
    • range: String
    • Example: GTGAATAT None
  • nucl_acid_amp OPT
    • Description: A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids
    • range: String
    • Example: https://phylogenomics.me/protocols/16s-pcr-protocol/ None
  • nucl_acid_ext OPT
    • Description: A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample
    • range: String
    • Example: https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf None
  • num_replicons OPT
    • Description: Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote
    • range: String
    • Example: 2 None
  • number_contig OPT
    • Description: Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG
    • range: String
    • Example: 40 None
  • pathogenicity OPT
    • Description: To what is the entity pathogenic
    • range: String
    • Example: human, animal, plant, fungi, bacteria None
  • pcr_cond OPT
    • Description: Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...'
    • range: String
    • Example: initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35 None
  • pcr_primers OPT
    • Description: PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters
    • range: String
    • Example: FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT None
  • ploidy OPT
    • Description: The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO
    • range: String
    • Example: allopolyploidy [PATO:0001379] None
  • pred_genome_struc OPT
    • Description: Expected structure of the viral genome
    • range: String
    • Example: non-segmented None
  • pred_genome_type OPT
    • Description: Type of genome predicted for the UViG
    • range: String
    • Example: dsDNA None
  • project_name OPT
    • Description: Name of the project within which the sequencing was organized
    • range: String
    • Example: Forest soil metagenome None
  • propagation OPT
    • Description: This field is specific to different taxa. For phages: lytic/lysogenic, for plasmids: incompatibility group, for eukaryotes: sexual/asexual (Note: there is the strong opinion to name phage propagation obligately lytic or temperate, therefore we also give this choice
    • range: String
    • Example: lytic None
  • reassembly_bin OPT
    • Description: Has an assembly been performed on a genome bin extracted from a metagenomic assembly?
    • range: String
    • Example: no None
  • ref_biomaterial OPT
    • Description: Primary publication if isolated before genome publication; otherwise, primary genome report
    • range: String
    • Example: doi:10.1016/j.syapm.2018.01.009 None
  • ref_db OPT
    • Description: List of database(s) used for ORF annotation, along with version number and reference to website or publication
    • range: String
    • Example: pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975 None
  • rel_to_oxygen OPT
    • Description: Is this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments
    • range: String
    • Example: aerobe None
  • samp_collect_device OPT
    • Description: The method or device employed for collecting the sample
    • range: String
    • Example: environmental swab sampling, biopsy, niskin bottle, push core None
  • samp_mat_process OPT
    • Description: Any processing applied to the sample during or after retrieving the sample from environment. This field accepts OBI, for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI
    • range: String
    • Example: filtering of seawater, storing samples in ethanol None
  • samp_size OPT
    • Description: Amount or size of sample (volume, mass or area) that was collected
    • range: String
    • Example: 5 liter None
  • sample_name OPT
    • Description: Sample Name is a name that you choose for the sample. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. Every Sample Name from a single Submitter must be unique.
    • range: String
    • Example: None
  • seq_meth OPT
    • Description: Sequencing method used; e.g. Sanger, ABI-solid
    • range: String
    • Example: Illumina HiSeq 1500 None
  • seq_quality_check OPT
    • Description: Indicate if the sequence has been called by automatic systems (none) or undergone a manual editing procedure (e.g. by inspecting the raw data or chromatograms). Applied only for sequences that are not submitted to SRA,ENA or DRA
    • range: String
    • Example: none None
  • sim_search_meth OPT
    • Description: Tool used to compare ORFs with database, along with version and cutoffs used
    • range: String
    • Example: HMMER3;3.1b2;hmmsearch, cutoff of 50 on score None
  • single_cell_lysis_appr OPT
    • Description: Method used to free DNA from interior of the cell(s) or particle(s)
    • range: String
    • Example: enzymatic None
  • single_cell_lysis_prot OPT
    • Description: Name of the kit or standard protocol used for cell(s) or particle(s) lysis
    • range: String
    • Example: ambion single cell lysis kit None
  • size_frac OPT
    • Description: Filtering pore size used in sample preparation
    • range: String
    • Example: 0-0.22 micrometer None
  • sop OPT
    • Description: Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
    • range: String
    • Example: http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/ None
  • sort_tech OPT
    • Description: Method used to sort/isolate cells or particles of interest
    • range: String
    • Example: optical manipulation None
  • source_mat_id OPT
    • Description: A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).
    • range: String
    • Example: MPI012345 None
  • source_uvig OPT
    • Description: Type of dataset from which the UViG was obtained
    • range: String
    • Example: viral fraction metagenome (virome) None
  • specific_host OPT
    • Description: If there is a host involved, please provide its taxid (or environmental if not actually isolated from the dead or alive host - i.e. a pathogen could be isolated from a swipe of a bench etc) and report whether it is a laboratory or natural host)
    • range: String
    • Example: 9606 None
  • submitted_to_insdc OPT
    • Description: Depending on the study (large-scale e.g. done with next generation sequencing technology, or small-scale) sequences have to be submitted to SRA (Sequence Read Archive), DRA (DDBJ Read Archive) or via the classical Webin/Sequin systems to Genbank, ENA and DDBJ. Although this field is mandatory, it is meant as a self-test field, therefore it is not necessary to include this field in contextual data submitted to databases
    • range: String
    • Example: yes None
  • subspecf_gen_lin OPT
    • Description: This should provide further information about the genetic distinctness of the sequenced organism by recording additional information e.g. serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. It can also contain alternative taxonomic information. It should contain both the lineage name, and the lineage rank, i.e. biovar:abc123
    • range: String
    • Example: serovar:Newport None
  • target_gene OPT
    • Description: Targeted gene or locus name for marker gene studies
    • range: String
    • Example: 16S rRNA, 18S rRNA, nif, amoA, rpo None
  • target_subfragment OPT
    • Description: Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA
    • range: String
    • Example: V6, V9, ITS None
  • tax_class OPT
    • Description: Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes
    • range: String
    • Example: vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters) None
  • tax_ident OPT
    • Description: The phylogenetic marker(s) used to assign an organism name to the SAG or MAG
    • range: String
    • Example: other: rpoB gene None
  • trna_ext_software OPT
    • Description: Tools used for tRNA identification
    • range: String
    • Example: infernal;v2;default parameters None
  • trnas OPT
    • Description: The total number of tRNAs identified from the SAG or MAG
    • range: String
    • Example: 18 None
  • trophic_level OPT
    • Description: Trophic levels are the feeding position in a food chain. Microbes can be a range of producers (e.g. chemolithotroph)
    • range: String
    • Example: heterotroph None
  • url OPT
    • range: String
    • Example: http://www.earthmicrobiome.org/ None
  • vir_ident_software OPT
    • Description: Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used
    • range: String
    • Example: VirSorter; 1.0.4; Virome database, category 2 None
  • virus_enrich_appr OPT
    • Description: List of approaches used to enrich the sample for viruses, if any
    • range: String
    • Example: filtration + FeCl Precipitation + ultracentrifugation + DNAse None
  • votu_class_appr OPT
    • Description: Cutoffs and approach used when clustering new UViGs in “species-level” vOTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside vOTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis
    • range: String
    • Example: 95% ANI;85% AF; greedy incremental clustering None
  • votu_db OPT
    • Description: Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" vOTUs, if any
    • range: String
    • Example: NCBI Viral RefSeq;83 None
  • votu_seq_comp_appr OPT
    • Description: Tool and thresholds used to compare sequences when computing "species-level" vOTUs
    • range: String
    • Example: blastn;2.6.0+;e-value cutoff: 0.001 None
  • wga_amp_appr OPT
    • Description: Method used to amplify genomic DNA in preparation for sequencing
    • range: String
    • Example: mda based None
  • wga_amp_kit OPT
    • Description: Kit used to amplify genomic DNA in preparation for sequencing
    • range: String
    • Example: qiagen repli-g None
  • x_16s_recover OPT
    • Description: Can a 16S gene be recovered from the submitted SAG or MAG?
    • range: String
    • Example: yes None
  • x_16s_recover_software OPT
    • Description: Tools used for 16S rRNA gene extraction
    • range: String
    • Example: rambl;v2;default parameters None