Discrepancy GTDB226 Archaea mrca_red root and GTDB226 Archaea tree

Kia ora koutou,

While running some code on the GTDB tree using the mrca-red values we noticed that the root displayed in gtdbtk_r226_ar53.tsv (GB_GCA_024860865.1|GB_GCA_036482035.1) was not found as the root in the ar53_r226.tree.

i.e running the following code:

from Bio import Phylo

from Bio.Phylo.Consensus import *

tree = Phylo.read(‘/nesi/nobackup/uc04105/new_databases_May/GTDB_226/GTDBK/226_tree/ar53_r226.tree’, ‘newick’)
tree.common_ancestor(‘GB_GCA_024860865.1’, ‘GB_GCA_036482035.1’)

returns:
‘Clade(branch_length=0.04653, confidence=27.0)’

In the bacterial taxonomy this is not an issue. Where the mrca_red returns Clade(Clade(name=‘100:d__Bacteria’)

Kind regards,
Mick

Hi Mike,

The rooting of the tree in the GTDB repository may differ from the one used for RED value calculations. The rooting in GTDB-Tk is essentially arbitrary and used purely for computational purposes. Later in the GTDB-Tk workflow, the tree is treated as unrooted.

The RED values provided in GTDB-Tk are determined by re-rooting the tree over all “plausible” rootings (i.e. above any phyla with >=2 classes) and not determined using a single tree. That said, the tree with the rooting you are looking for can be found in the pplacer folder of the GTDB-Tk package data.

For more information you can look at our original GTDB paper: A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life | Nature Biotechnology

Cheers,
Pierre