Multiple gtdbtk classify_wf output treefile

Hi,
Sorry for the dumb question, it is the first time I use GTDB-tk: I used gtdbtk classify_wf command, and got multiple treefiles in the output, and I could not figure out why, especially what is “gtdbtk.high.bac120.classify.tree”?

I got:
gtdbtk.bac120.classify.tree.11.tree
gtdbtk.bac120.classify.tree.6.tree
gtdbtk.high.bac120.classify.tree
gtdbtk.bac120.classify.tree.13.tree
gtdbtk.bac120.classify.tree.19.tree

Thanks in advance for the explanation.

Hello,
Starting with version 2.0.0, gtdbtk classify_wf pipeline uses the divide-and-conquer approach by default.
With this approach the placement of genomes is now divided in 2 steps:

  • The first one is to place the user genomes onto a backbone tree representing one genome per family in GTDB (gtdbtk.high.bac120.classify.tree) .
  • The genomes classified with a class level (order level for GTDB-Tk v2.0.0) on the backbone tree are then placed in a second tree (class-level subtrees) Those trees are made by a subsets of classes and are indexed for 1 to n ( gtdbtk.bac120.classify.tree..tree) .
  • This second placement will finalise the taxonomy of the user genome down to species level ( if possible)

We are preparing a manuscript with the details.

Please note, from version 2.1.0, gtdbtk.high.bac120.classify.tree has been renamed to gtdbtk.backbone.bac120.classify.tree