Kia ora koutou,
While running some code on the GTDB tree using the mrca-red values we noticed that the root displayed in gtdbtk_r226_ar53.tsv (GB_GCA_024860865.1|GB_GCA_036482035.1) was not found as the root in the ar53_r226.tree.
i.e running the following code:
from Bio import Phylo
from Bio.Phylo.Consensus import *
tree = Phylo.read(‘/nesi/nobackup/uc04105/new_databases_May/GTDB_226/GTDBK/226_tree/ar53_r226.tree’, ‘newick’)
tree.common_ancestor(‘GB_GCA_024860865.1’, ‘GB_GCA_036482035.1’)
returns:
‘Clade(branch_length=0.04653, confidence=27.0)’
In the bacterial taxonomy this is not an issue. Where the mrca_red returns Clade(Clade(name=‘100:d__Bacteria’)
Kind regards,
Mick
Hi Mike,
The rooting of the tree in the GTDB repository may differ from the one used for RED value calculations. The rooting in GTDB-Tk is essentially arbitrary and used purely for computational purposes. Later in the GTDB-Tk workflow, the tree is treated as unrooted.
The RED values provided in GTDB-Tk are determined by re-rooting the tree over all “plausible” rootings (i.e. above any phyla with >=2 classes) and not determined using a single tree. That said, the tree with the rooting you are looking for can be found in the pplacer
folder of the GTDB-Tk package data.
For more information you can look at our original GTDB paper: A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life | Nature Biotechnology
Cheers,
Pierre