I can’t understand why there are multiple species cluster representatives in Bifidobacterium genus, and why there is significant divergence in the taxonomy of those representative genomes between GTDB and NCBI. Any clarification is appreciated!
In case anybody needs an example, I could provide aspreadsheet.
I’d be interested in seeing your spreadsheet with the observed differences. GTDB classifies genomes based on ANI to representative genomes which are the type strain of the species when available. This methodology is explained in:
In the meantime, here is a screenshot illustrating multiple genus cluster representatives in different genera, as well as showing the NCBI-GTDB taxonomy mismatch. Some Bifidobacterium are taxified as Gardnerella in the NCBI.
Moreover, no Bifidobacterium GTDB representative in the screenshot has the same species-level taxa in NCBI.
Gardnerella has been merged with Bifidobacterium within the GTDB. You can “see” this using the GTDB Taxon History tool:
This illustrates that the name Gardnerella doesn’t exist within the GTDB taxonomy and all instances of Gardnerella at NCBI translate to Bifidobacterium with the exception of 3 genomes there were classified to Limosilactobacillus in GTDB. This is a consequence of the GTDB normalizing taxa based on relative evolution divergence (RED; see the primary GTDB manuscript).
The species reclassifications are the result of delineating species via ANI. Do you have a phylogenetic tree that spans all these genomes? I’d be interested to see if the tree is more congruent with the GTDB or NCBI classifications.