Class: Pangenome
Summary statistics for a species pangenome. One row per species. Contains counts of core/auxiliary/singleton genes and quality metrics.
EXAMPLE PANGENOME STATISTICS: | Species | Genomes | Core | Aux | Total | |------------------|---------|-------|---------|---------| | K. pneumoniae | 14,240 | 4,199 | 438,925 | 443,124 | | S. aureus | 14,526 | 2,083 | 145,831 | 147,914 | | S. enterica | 11,402 | 3,639 | 262,732 | 266,371 | | P. aeruginosa | 6,760 | 5,199 | 250,894 | 256,093 |
URI: https://w3id.org/kbase/kbase_ke_pangenome/Pangenome
classDiagram
class Pangenome
click Pangenome href "../Pangenome/"
Pangenome : corrected_mean_completness
Pangenome : gtdb_species_clade_id
Pangenome --> "1" GtdbSpeciesClade : gtdb_species_clade_id
click GtdbSpeciesClade href "../GtdbSpeciesClade/"
Pangenome : mean_initial_completeness
Pangenome : no_aux_genome
Pangenome : no_core
Pangenome : no_gene_clusters
Pangenome : no_genomes
Pangenome : no_singleton_gene_clusters
Pangenome : number_of_iterations
Pangenome : protocol_id
Pangenome : total_sum_of_loglikelihood_ratios
Slots
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| gtdb_species_clade_id | 1 GtdbSpeciesClade |
Species clade this pangenome summarizes | direct |
| protocol_id | 0..1 String |
Analysis protocol version identifier | direct |
| number_of_iterations | 0..1 Integer |
PPanGGOLiN model training iterations (0 = converged early) | direct |
| mean_initial_completeness | 0..1 Float |
Mean CheckM completeness of input genomes before filtering | direct |
| total_sum_of_loglikelihood_ratios | 0..1 Float |
Model fit quality metric | direct |
| corrected_mean_completness | 0..1 Float |
Completeness after pangenome-based correction | direct |
| no_aux_genome | 0..1 Integer |
Number of auxiliary (shell) gene clusters | direct |
| no_core | 0..1 Integer |
Number of core gene clusters | direct |
| no_singleton_gene_clusters | 0..1 Integer |
Number of singleton clusters | direct |
| no_gene_clusters | 0..1 Integer |
Total gene clusters (core + auxiliary + singleton) | direct |
| no_genomes | 0..1 Integer |
Number of genomes in pangenome analysis | direct |
Identifier and Mapping Information
Annotations
| property | value |
|---|---|
| source_table | pangenome |
Schema Source
- from schema: https://w3id.org/kbase/kbase_ke_pangenome
Mappings
| Mapping Type | Mapped Value |
|---|---|
| self | https://w3id.org/kbase/kbase_ke_pangenome/Pangenome |
| native | https://w3id.org/kbase/kbase_ke_pangenome/Pangenome |
LinkML Source
Direct
name: Pangenome
annotations:
source_table:
tag: source_table
value: pangenome
description: 'Summary statistics for a species pangenome. One row per species. Contains
counts of core/auxiliary/singleton genes and quality metrics.
EXAMPLE PANGENOME STATISTICS: | Species | Genomes | Core | Aux | Total |
|------------------|---------|-------|---------|---------| | K. pneumoniae |
14,240 | 4,199 | 438,925 | 443,124 | | S. aureus | 14,526 | 2,083 | 145,831
| 147,914 | | S. enterica | 11,402 | 3,639 | 262,732 | 266,371 | | P. aeruginosa |
6,760 | 5,199 | 250,894 | 256,093 |'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
attributes:
gtdb_species_clade_id:
name: gtdb_species_clade_id
description: Species clade this pangenome summarizes
comments:
- 'Foreign key: GtdbSpeciesClade.gtdb_species_clade_id'
examples:
- value: s__Klebsiella_pneumoniae--RS_GCF_000742135.1
- value: s__Staphylococcus_aureus--RS_GCF_001027105.1
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
identifier: true
domain_of:
- GtdbSpeciesClade
- Genome
- GeneCluster
- Pangenome
range: GtdbSpeciesClade
required: true
protocol_id:
name: protocol_id
description: 'Analysis protocol version identifier. NOT a foreign key - this is
a version string constant that identifies the pangenome computation pipeline
and date.
Format breakdown: PGNKE_MMS90_V01_DEC2024 - PGNKE: Pangenome KBase project prefix
- MMS90: Method/parameter set identifier (MMSeqs2-based, 90% identity?) - V01:
Version 01 - DEC2024: Analysis run date (December 2024)
Currently all pangenomes in the database share the same protocol_id value, indicating
they were computed in a single batch analysis run.'
examples:
- value: PGNKE_MMS90_V01_DEC2024
description: Current protocol - all pangenomes use this value
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
- GenomeAni
range: string
number_of_iterations:
name: number_of_iterations
description: PPanGGOLiN model training iterations (0 = converged early)
examples:
- value: '0'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 0
mean_initial_completeness:
name: mean_initial_completeness
description: Mean CheckM completeness of input genomes before filtering. Quality
threshold typically 90%.
examples:
- value: '95.0'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: float
minimum_value: 0.0
maximum_value: 100.0
total_sum_of_loglikelihood_ratios:
name: total_sum_of_loglikelihood_ratios
description: Model fit quality metric. Larger negative values indicate larger
species with more genes.
examples:
- value: '-14186263623.030312'
description: Large pangenome (K. pneumoniae)
- value: '-5100735177.719256'
description: Medium pangenome (S. aureus)
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: float
corrected_mean_completness:
name: corrected_mean_completness
description: Completeness after pangenome-based correction. Usually higher than
initial because some "missing" genes are species-absent.
examples:
- value: '99.24400808378776'
- value: '99.36403620861542'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: float
minimum_value: 0.0
maximum_value: 100.0
no_aux_genome:
name: no_aux_genome
description: Number of auxiliary (shell) gene clusters
examples:
- value: '438925'
description: K. pneumoniae - open pangenome
- value: '145831'
description: S. aureus - more closed pangenome
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_core:
name: no_core
description: Number of core gene clusters. Essential species functions. Core genome
size correlates with genome size.
examples:
- value: '4199'
description: K. pneumoniae
- value: '2083'
description: S. aureus (smaller genome)
- value: '5199'
description: P. aeruginosa (larger genome)
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_singleton_gene_clusters:
name: no_singleton_gene_clusters
description: Number of singleton clusters. High counts indicate diverse accessory
genome or sequencing artifacts.
examples:
- value: '276743'
- value: '86127'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_gene_clusters:
name: no_gene_clusters
description: Total gene clusters (core + auxiliary + singleton)
examples:
- value: '443124'
- value: '147914'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 1
no_genomes:
name: no_genomes
description: Number of genomes in pangenome analysis
examples:
- value: '14240'
- value: '14526'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
domain_of:
- Pangenome
range: integer
minimum_value: 1
Induced
name: Pangenome
annotations:
source_table:
tag: source_table
value: pangenome
description: 'Summary statistics for a species pangenome. One row per species. Contains
counts of core/auxiliary/singleton genes and quality metrics.
EXAMPLE PANGENOME STATISTICS: | Species | Genomes | Core | Aux | Total |
|------------------|---------|-------|---------|---------| | K. pneumoniae |
14,240 | 4,199 | 438,925 | 443,124 | | S. aureus | 14,526 | 2,083 | 145,831
| 147,914 | | S. enterica | 11,402 | 3,639 | 262,732 | 266,371 | | P. aeruginosa |
6,760 | 5,199 | 250,894 | 256,093 |'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
attributes:
gtdb_species_clade_id:
name: gtdb_species_clade_id
description: Species clade this pangenome summarizes
comments:
- 'Foreign key: GtdbSpeciesClade.gtdb_species_clade_id'
examples:
- value: s__Klebsiella_pneumoniae--RS_GCF_000742135.1
- value: s__Staphylococcus_aureus--RS_GCF_001027105.1
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
identifier: true
alias: gtdb_species_clade_id
owner: Pangenome
domain_of:
- GtdbSpeciesClade
- Genome
- GeneCluster
- Pangenome
range: GtdbSpeciesClade
required: true
protocol_id:
name: protocol_id
description: 'Analysis protocol version identifier. NOT a foreign key - this is
a version string constant that identifies the pangenome computation pipeline
and date.
Format breakdown: PGNKE_MMS90_V01_DEC2024 - PGNKE: Pangenome KBase project prefix
- MMS90: Method/parameter set identifier (MMSeqs2-based, 90% identity?) - V01:
Version 01 - DEC2024: Analysis run date (December 2024)
Currently all pangenomes in the database share the same protocol_id value, indicating
they were computed in a single batch analysis run.'
examples:
- value: PGNKE_MMS90_V01_DEC2024
description: Current protocol - all pangenomes use this value
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: protocol_id
owner: Pangenome
domain_of:
- Pangenome
- GenomeAni
range: string
number_of_iterations:
name: number_of_iterations
description: PPanGGOLiN model training iterations (0 = converged early)
examples:
- value: '0'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: number_of_iterations
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 0
mean_initial_completeness:
name: mean_initial_completeness
description: Mean CheckM completeness of input genomes before filtering. Quality
threshold typically 90%.
examples:
- value: '95.0'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: mean_initial_completeness
owner: Pangenome
domain_of:
- Pangenome
range: float
minimum_value: 0.0
maximum_value: 100.0
total_sum_of_loglikelihood_ratios:
name: total_sum_of_loglikelihood_ratios
description: Model fit quality metric. Larger negative values indicate larger
species with more genes.
examples:
- value: '-14186263623.030312'
description: Large pangenome (K. pneumoniae)
- value: '-5100735177.719256'
description: Medium pangenome (S. aureus)
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: total_sum_of_loglikelihood_ratios
owner: Pangenome
domain_of:
- Pangenome
range: float
corrected_mean_completness:
name: corrected_mean_completness
description: Completeness after pangenome-based correction. Usually higher than
initial because some "missing" genes are species-absent.
examples:
- value: '99.24400808378776'
- value: '99.36403620861542'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: corrected_mean_completness
owner: Pangenome
domain_of:
- Pangenome
range: float
minimum_value: 0.0
maximum_value: 100.0
no_aux_genome:
name: no_aux_genome
description: Number of auxiliary (shell) gene clusters
examples:
- value: '438925'
description: K. pneumoniae - open pangenome
- value: '145831'
description: S. aureus - more closed pangenome
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_aux_genome
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_core:
name: no_core
description: Number of core gene clusters. Essential species functions. Core genome
size correlates with genome size.
examples:
- value: '4199'
description: K. pneumoniae
- value: '2083'
description: S. aureus (smaller genome)
- value: '5199'
description: P. aeruginosa (larger genome)
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_core
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_singleton_gene_clusters:
name: no_singleton_gene_clusters
description: Number of singleton clusters. High counts indicate diverse accessory
genome or sequencing artifacts.
examples:
- value: '276743'
- value: '86127'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_singleton_gene_clusters
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 0
no_gene_clusters:
name: no_gene_clusters
description: Total gene clusters (core + auxiliary + singleton)
examples:
- value: '443124'
- value: '147914'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_gene_clusters
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 1
no_genomes:
name: no_genomes
description: Number of genomes in pangenome analysis
examples:
- value: '14240'
- value: '14526'
from_schema: https://w3id.org/kbase/kbase_ke_pangenome
rank: 1000
alias: no_genomes
owner: Pangenome
domain_of:
- Pangenome
range: integer
minimum_value: 1