I am unable to understand how GTDB taxonomy lineage is prepared such that we have the same “order” taxa name despite being in different “phylum” taxa names?
Shouldn’t the higher resolution taxa names (order, family, genus, species, strain) start to separate away from the lower resolution (kingdom, phylum) since the reference genomes are already differentiated at the lower level (phylum)? Why is the taxonomy lineage regrouping at the level of “order”?
PS there are many more species following said pattern
I have been using R214 for my project since a while now so I never realized that R226 had been released (with the reclassification of Clostridium septicum)
I reckon the best way forward is to update my GTDB taxonomy to the latest release as well
We update the GTDB every April and the website always reflects the latest release. Generally, there will be some conflicting classifications if one ends up mixing results from different GTDB releases.
Note that the suffix on higher taxon names (family and above) only indicates polyphyly of a group in a given reference FastTree, so for all intents and purposes Bacillota = Bacillota_A, etc. With a perfect reference tree these suffixes wouldn’t exist.