GTDB-Tk tree related questions

Hello everyone,

I have three questions hoping to be clarified here.

First, I’d like to construct a combination of bacterial and archaeal tree of life using the 120 bacterial and 122 archaeal marker genes implemented in the tool and place my own MAGs on the tree. I figured that the .tree file produced from the de_novo_wf could serve this purpose, I am just trying to make sure if this is the right approach to go about what I am trying to achieve here.

In addition, it seems that the classify workflow is used for taxonomic classifications, while the de novo workflow is better from tree construction, but both workflows produce a tree file, what is the difference between the tree file produced from the two workflows?

Lastly, what does it mean to decorate a tree? Does it mean to assign taxonomy info into the tree? I would like to color the branches of the tree base on their corresponding phylum using ggtree, so is it possible to only assign phylum info onto the tree branches?

I appreciate any help you could provide, thank you!

Rui

Hi Rui,

The 120 bacterial and 122 archaeal marker sets are specifically designed for their respective domain. I wouldn’t recommend these to try and infer a single prokaryotic tree spanning both domains.

The classify workflow places your genome into the GTDB reference tree using pplacer. In contrast, the de novo workflow infers a new tree using FastTree.

Yes - we use the term decorate to mean the act of assigning taxonomic information to a tree. You might be able to construct a classification file that only contains phyla, but this is outside the typical use case of GTDB-Tk.

1 Like

Hello Dr. Parks,

Thanks for the clarifications! So I guess in my case, the de-novo workflow would be preferred if I were to construct domain-specific trees (one for bacteria, and one for archaea) using my own MAGs and the GTDB reference genomes, once I have a tree, I could color the branches according to their phyla.

Also, I wonder if you could provide any insight as to what is an optimal way to construct a single prokaryotic tree that span both domains? My idea is to place the MAGs (which include both bacteria and archaea) onto a single tree, and see how MAGs from different sampling sites are distributed on a prokaryotic tree of life. Thanks again!

Rui

Hi Rui,

I don’t believe there is consensus on the best way to infer a prokaryotic tree. However, a common approach is to consider ribosomal proteins:
https://www.nature.com/articles/nmicrobiol201648

Cheers,
Donovan

1 Like

Thanks for your advice, appreciate it!

Rui

@donovan.parks I am trying to estimate the phylogenetic gain from a set of MAGs relative to GTDB r202. Can the pplacer tree be used for this purpose or is it better to use the one estimated using the de novo wf?

Cheers, Adi

Hi Adi,

Personally, I’d use a de novo tree. I’m not confident that pplacer can accurately establish the proper topology and branch lengths between your MAGs. It very well might be able to do this reasonably enough, but I’ve never tested this.

Cheers,
Donovan

Thanks, Donovan, much appreciated

Cheers, Adi