GTDB Forum

Difference between GTDB taxonomy and bac120.tree

Hi everyone. I’m a bit confused about some differences I noticed between the newick tree that can be downloaded from the GTDB repository (currently using bac120_r95.tree) and the taxonomy found on GTDB - Tree. It’s probably something trivial that I’m missing.

I am trying to make a tree that shows the relation between selected genuses by pruning the bac120_r95.tree tree. One genus that I want to show is g__Rubinisphaera . This genus (and others) is present on the website, but not in the newick tree file, how come? Is there another tree file that contains it?

Hi. Taxa represented by a single genome have no internal node so will not appear in the Newick tree. So, in your case, the only species in g__Rubinisphaera is s__Rubinisphaera brasiliensis which is represented by the genome GCF_000165715.2. As such, there is no internal node to hang the label g__Rubinisphaera.

We do provide a tree where the terminal (leaf) nodes are GTDB taxon names instead of the genome IDs:
https://data.gtdb.ecogenomic.org/releases/release95/95.0/auxillary_files/bac120_r95.sp_labels.tree

g__Rubinisphaera will show up in this tree on a leaf node.

Cheers,
Donovan

Thank you for your reply!

I see what you mean. However, since the node g__Rubinisphaera is not present also in the tree you linked, would renaming the s__Rubinisphaera brasiliensis to g__Rubinisphaera be a correct approach in my case?

Gianluca

Hi. Sorry for the slow reply. You are correct that the leaf node labelled s__Rubinisphaera brasiliensis is also the correct node for the taxon g__Rubinisphaera.