Class: GtdbSpeciesClade
GTDB species-level grouping with representative genome. Each clade represents a species cluster defined by 95% ANI threshold following GTDB taxonomy.
A species clade contains all genomes that cluster together at >95% ANI with the representative genome. The representative is typically the highest-quality or type strain genome.
EXAMPLE CLADES (by genome count): - s__Staphylococcus_aureus--RS_GCF_001027105.1: 14,526 genomes - s__Klebsiella_pneumoniae--RS_GCF_000742135.1: 14,240 genomes - s__Salmonella_enterica--RS_GCF_000006945.2: 11,402 genomes
USAGE: Start here to explore pangenomes. Join to Pangenome for statistics, to Genome for individual assemblies, to GeneCluster for gene families.
URI: https://w3id.org/kbase/kbase_ke_pangenome/GtdbSpeciesClade
Slots
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| gtdb_species_clade_id | 1 String |
Species clade ID combining species name and representative genome | direct |
| representative_genome_id | 0..1 Genome |
Reference genome for this species | direct |
| GTDB_species | 0..1 String |
GTDB species name with s__ prefix | direct |
| GTDB_taxonomy | 0..1 String |
Full GTDB lineage from domain to genus (species not repeated) | direct |
| ANI_circumscription_radius | 0..1 Float |
ANI threshold for species membership | direct |
| mean_intra_species_ANI | 0..1 Float |
Mean pairwise ANI among all genomes | direct |
| min_intra_species_ANI | 0..1 Float |
Minimum pairwise ANI observed | direct |
| mean_intra_species_AF | 0..1 Float |
Mean alignment fraction - proportion of genome aligning in ANI calculations | direct |
| min_intra_species_AF | 0..1 Float |
Minimum alignment fraction observed between any two genomes | direct |
| no_clustered_genomes_unfiltered | 0..1 Integer |
Total genomes assigned to species before quality filtering | direct |
| no_clustered_genomes_filtered | 0..1 Integer |
Genomes passing quality filters used in pangenome analysis | direct |
Usages
| used by | used in | type | used |
|---|---|---|---|
| Genome | gtdb_species_clade_id | range | GtdbSpeciesClade |
| GeneCluster | gtdb_species_clade_id | range | GtdbSpeciesClade |
| Pangenome | gtdb_species_clade_id | range | GtdbSpeciesClade |
Identifier and Mapping Information
Annotations
| property | value |
|---|---|
| source_table | gtdb_species_clade |
Schema Source
- from schema: https://w3id.org/kbase/kbase_ke_pangenome
Mappings
| Mapping Type | Mapped Value |
|---|---|
| self | https://w3id.org/kbase/kbase_ke_pangenome/GtdbSpeciesClade |
| native | https://w3id.org/kbase/kbase_ke_pangenome/GtdbSpeciesClade |
LinkML Source
Direct
name: GtdbSpeciesClade
annotations:
source_table:
tag: source_table
value: gtdb_species_clade
description: 'GTDB species-level grouping with representative genome. Each clade represents
a species cluster defined by 95% ANI threshold following GTDB taxonomy.
A species clade contains all genomes that cluster together at >95% ANI with the
representative genome. The representative is typically the highest-quality or type
strain genome.
EXAMPLE CLADES (by genome count): - s__Staphylococcus_aureus--RS_GCF_001027105.1:
14,526 genomes - s__Klebsiella_pneumoniae--RS_GCF_000742135.1: 14,240 genomes -
s__Salmonella_enterica--RS_GCF_000006945.2: 11,402 genomes
USAGE: Start here to explore pangenomes. Join to Pangenome for statistics, to Genome
for individual assemblies, to GeneCluster for gene families.'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
attributes:
gtdb_species_clade_id:
name: gtdb_species_clade_id
description: 'Species clade ID combining species name and representative genome.
Format: s__Genus_species--{RS|GB}_GC{F|A}_XXXXXXXXX.X
The ID encodes: species name (s__prefix), representative source (RS=RefSeq,
GB=GenBank), and assembly accession.'
examples:
- value: s__Klebsiella_pneumoniae--RS_GCF_000742135.1
description: K. pneumoniae - major human pathogen, 14K+ genomes
- value: s__Staphylococcus_aureus--RS_GCF_001027105.1
description: S. aureus - most genomes in database
- value: s__Escherichia_coli--RS_GCF_000005845.2
description: E. coli K-12 - model organism
- value: s__Mycobacterium_tuberculosis--RS_GCF_000195955.2
description: TB pathogen - highly clonal species
- value: s__Pseudomonas_aeruginosa--RS_GCF_001457615.1
description: Opportunistic pathogen with large pangenome
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
identifier: true
domain_of:
- GtdbSpeciesClade
- Genome
- GeneCluster
- Pangenome
range: string
required: true
pattern: s__[A-Za-z0-9_]+--[A-Z]{2}_GC[AF]_\d+\.\d+
representative_genome_id:
name: representative_genome_id
description: Reference genome for this species. Typically highest quality assembly
or type strain. Used as anchor for ANI calculations.
comments:
- 'Foreign key: Genome.genome_id'
examples:
- value: RS_GCF_000742135.1
- value: RS_GCF_001027105.1
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: Genome
GTDB_species:
name: GTDB_species
description: GTDB species name with s__ prefix. May differ from NCBI species name
due to GTDB's genome-based taxonomy.
examples:
- value: s__Klebsiella_pneumoniae
- value: s__Staphylococcus_aureus
- value: s__Escherichia_coli
- value: s__Bacillus_subtilis
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: string
pattern: s__[A-Za-z0-9_]+
GTDB_taxonomy:
name: GTDB_taxonomy
description: 'Full GTDB lineage from domain to genus (species not repeated). Semicolon-separated
with rank prefixes: d__, p__, c__, o__, f__, g__'
examples:
- value: d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
- value: d__Bacteria;p__Bacillota;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus
- value: d__Archaea;p__Methanobacteriota_B;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus_A
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: string
ANI_circumscription_radius:
name: ANI_circumscription_radius
description: ANI threshold for species membership. Typically 95% for most species.
Some species have tighter boundaries (higher values).
examples:
- value: '95.0'
description: Standard species threshold
- value: '95.239'
description: K. pneumoniae threshold
- value: '97.08'
description: Tighter threshold for clonal species
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 90.0
maximum_value: 100.0
unit:
ucum_code: '%'
mean_intra_species_ANI:
name: mean_intra_species_ANI
description: Mean pairwise ANI among all genomes. Higher values indicate more
clonal/homogeneous species. Lower values suggest higher diversity.
examples:
- value: '98.97'
description: K. pneumoniae - moderately diverse
- value: '99.5'
description: Highly clonal species (e.g., M. tuberculosis)
- value: '96.5'
description: Highly diverse species
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 95.0
maximum_value: 100.0
min_intra_species_ANI:
name: min_intra_species_ANI
description: Minimum pairwise ANI observed. Low values near 95% indicate species
at boundary of splitting into subspecies.
examples:
- value: '95.28'
- value: '97.08'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 90.0
maximum_value: 100.0
mean_intra_species_AF:
name: mean_intra_species_AF
description: Mean alignment fraction - proportion of genome aligning in ANI calculations.
Low AF may indicate accessory genome differences.
examples:
- value: '0.88'
- value: '0.95'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 0.0
maximum_value: 1.0
min_intra_species_AF:
name: min_intra_species_AF
description: Minimum alignment fraction observed between any two genomes
examples:
- value: '0.76'
- value: '0.82'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 0.0
maximum_value: 1.0
no_clustered_genomes_unfiltered:
name: no_clustered_genomes_unfiltered
description: Total genomes assigned to species before quality filtering. Difference
from filtered count indicates low-quality genomes removed.
examples:
- value: '14975'
description: K. pneumoniae unfiltered
- value: '14959'
description: S. aureus unfiltered
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: integer
minimum_value: 1
no_clustered_genomes_filtered:
name: no_clustered_genomes_filtered
description: Genomes passing quality filters used in pangenome analysis. These
genomes have sufficient completeness and low contamination.
examples:
- value: '14240'
description: K. pneumoniae after filtering
- value: '14526'
description: S. aureus after filtering
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- GtdbSpeciesClade
range: integer
minimum_value: 1
Induced
name: GtdbSpeciesClade
annotations:
source_table:
tag: source_table
value: gtdb_species_clade
description: 'GTDB species-level grouping with representative genome. Each clade represents
a species cluster defined by 95% ANI threshold following GTDB taxonomy.
A species clade contains all genomes that cluster together at >95% ANI with the
representative genome. The representative is typically the highest-quality or type
strain genome.
EXAMPLE CLADES (by genome count): - s__Staphylococcus_aureus--RS_GCF_001027105.1:
14,526 genomes - s__Klebsiella_pneumoniae--RS_GCF_000742135.1: 14,240 genomes -
s__Salmonella_enterica--RS_GCF_000006945.2: 11,402 genomes
USAGE: Start here to explore pangenomes. Join to Pangenome for statistics, to Genome
for individual assemblies, to GeneCluster for gene families.'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
attributes:
gtdb_species_clade_id:
name: gtdb_species_clade_id
description: 'Species clade ID combining species name and representative genome.
Format: s__Genus_species--{RS|GB}_GC{F|A}_XXXXXXXXX.X
The ID encodes: species name (s__prefix), representative source (RS=RefSeq,
GB=GenBank), and assembly accession.'
examples:
- value: s__Klebsiella_pneumoniae--RS_GCF_000742135.1
description: K. pneumoniae - major human pathogen, 14K+ genomes
- value: s__Staphylococcus_aureus--RS_GCF_001027105.1
description: S. aureus - most genomes in database
- value: s__Escherichia_coli--RS_GCF_000005845.2
description: E. coli K-12 - model organism
- value: s__Mycobacterium_tuberculosis--RS_GCF_000195955.2
description: TB pathogen - highly clonal species
- value: s__Pseudomonas_aeruginosa--RS_GCF_001457615.1
description: Opportunistic pathogen with large pangenome
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
identifier: true
alias: gtdb_species_clade_id
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
- Genome
- GeneCluster
- Pangenome
range: string
required: true
pattern: s__[A-Za-z0-9_]+--[A-Z]{2}_GC[AF]_\d+\.\d+
representative_genome_id:
name: representative_genome_id
description: Reference genome for this species. Typically highest quality assembly
or type strain. Used as anchor for ANI calculations.
comments:
- 'Foreign key: Genome.genome_id'
examples:
- value: RS_GCF_000742135.1
- value: RS_GCF_001027105.1
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: representative_genome_id
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: Genome
GTDB_species:
name: GTDB_species
description: GTDB species name with s__ prefix. May differ from NCBI species name
due to GTDB's genome-based taxonomy.
examples:
- value: s__Klebsiella_pneumoniae
- value: s__Staphylococcus_aureus
- value: s__Escherichia_coli
- value: s__Bacillus_subtilis
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: GTDB_species
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: string
pattern: s__[A-Za-z0-9_]+
GTDB_taxonomy:
name: GTDB_taxonomy
description: 'Full GTDB lineage from domain to genus (species not repeated). Semicolon-separated
with rank prefixes: d__, p__, c__, o__, f__, g__'
examples:
- value: d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
- value: d__Bacteria;p__Bacillota;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus
- value: d__Archaea;p__Methanobacteriota_B;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus_A
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: GTDB_taxonomy
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: string
ANI_circumscription_radius:
name: ANI_circumscription_radius
description: ANI threshold for species membership. Typically 95% for most species.
Some species have tighter boundaries (higher values).
examples:
- value: '95.0'
description: Standard species threshold
- value: '95.239'
description: K. pneumoniae threshold
- value: '97.08'
description: Tighter threshold for clonal species
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: ANI_circumscription_radius
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 90.0
maximum_value: 100.0
unit:
ucum_code: '%'
mean_intra_species_ANI:
name: mean_intra_species_ANI
description: Mean pairwise ANI among all genomes. Higher values indicate more
clonal/homogeneous species. Lower values suggest higher diversity.
examples:
- value: '98.97'
description: K. pneumoniae - moderately diverse
- value: '99.5'
description: Highly clonal species (e.g., M. tuberculosis)
- value: '96.5'
description: Highly diverse species
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: mean_intra_species_ANI
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 95.0
maximum_value: 100.0
min_intra_species_ANI:
name: min_intra_species_ANI
description: Minimum pairwise ANI observed. Low values near 95% indicate species
at boundary of splitting into subspecies.
examples:
- value: '95.28'
- value: '97.08'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: min_intra_species_ANI
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 90.0
maximum_value: 100.0
mean_intra_species_AF:
name: mean_intra_species_AF
description: Mean alignment fraction - proportion of genome aligning in ANI calculations.
Low AF may indicate accessory genome differences.
examples:
- value: '0.88'
- value: '0.95'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: mean_intra_species_AF
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 0.0
maximum_value: 1.0
min_intra_species_AF:
name: min_intra_species_AF
description: Minimum alignment fraction observed between any two genomes
examples:
- value: '0.76'
- value: '0.82'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: min_intra_species_AF
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: float
minimum_value: 0.0
maximum_value: 1.0
no_clustered_genomes_unfiltered:
name: no_clustered_genomes_unfiltered
description: Total genomes assigned to species before quality filtering. Difference
from filtered count indicates low-quality genomes removed.
examples:
- value: '14975'
description: K. pneumoniae unfiltered
- value: '14959'
description: S. aureus unfiltered
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_clustered_genomes_unfiltered
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: integer
minimum_value: 1
no_clustered_genomes_filtered:
name: no_clustered_genomes_filtered
description: Genomes passing quality filters used in pangenome analysis. These
genomes have sufficient completeness and low contamination.
examples:
- value: '14240'
description: K. pneumoniae after filtering
- value: '14526'
description: S. aureus after filtering
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_clustered_genomes_filtered
owner: GtdbSpeciesClade
domain_of:
- GtdbSpeciesClade
range: integer
minimum_value: 1