GTDB Forum

Tree differences between r89 and r95

Hi All,
I apologize in advance if I missed an answer somewhere (or if it is obvious!), but it appears that the trees have changed significantly between the two releases. I am particularly interested in the Elusimicrobiota. In the first release it was very near the root, while Firmicutes_A was nearly the furthest away from the root. In the current release, Firmicutes_A is pretty close to the root, while Elusimicrobiota is further away. Not only has the order changed (that’s possible with a different outgroup), but so has the distance between the two.
I’ve looked at release notes for n95, but didn’t see anything of relevance to this question. I would appreciate any leads.
Thanks!
Yana
PS As a disclaimer, I am using Annotree to visualize and haven’t yet looked at the tree files myself.

Hi Yana. The GTDB reference trees should be treated as unrooted. For practical purposes, we do root the tree, but the rooting is arbitrary since there is no consensus on the correct root. We infer the GTDB reference tree de novo with each release. As such, the branching order between phyla may change and is a reflection of the uncertainty in the relative ordering of phyla.

Hi Donovan,
thank you for this answer! It explains a lot.
Just to explain the confusion: the AnnoTree display is thus misleading, as it consistently has a root and no sign of the fact that the root is arbitrarily chosen. Furthermore, as per the GTDB paper: “we took an operational approach and rooted trees at the midpoint of all branches leading to phyla with two or more classes.” As I mentioned before, I can see how that would change with the addition of more phyla, but did not expect the change to be so drastic.
Having said that – why would the distance between phyla decrease? that is not something I expect with addition of more genomes.
Thank you!
Yana

Hi Yana. Sorry for the slow reply. The GTDB reference trees are rooted at the midpoint of all branches leading to phyla for the purposes of calculating relative evolutionary divergence. The GTDB reference tree should be treated as arbitrarily rooted and we may select a different arbitrary root from release to release.