Hi,
GTDB-TK doesn’t have any direct functionality for replacing the GTDB identifiers with other information. Perhaps others have put together a script that does this sort of reformatting of the leaf labels. Sorry I can’t be on more direct help.
Cheers,
Donovan
Hi, Thank you very much for all your help.
Best
DP
This is still an element to GTDB-TK that i dont understand. After running classify, I get a dir called classify with a bunch of gtdb/bac120.classify.tree.1.tree. I have 8 differnt trees with a different number of GFC_### as the branch nodes.
It seems as though there is a community desire to use GTDB-Tk as a tree maker and take the taxanomically identified MAGs/genomes and have them incorporated into a tree with their other related reference genomes.
What is the purpose of the trees that are made and put into the classify dir?
Hi Katie,
In order to reduce computationally requirements, GTDB-Tk first places genomes into a sparse background tree to determine the class-level classification for each genome. Genomes are then placed into class-level trees to determine their final classification. The purpose of these trees is to determine the taxonomic classification of each user/query genome. You can find details on this approach in: GTDB-Tk v2: memory friendly classification with the genome taxonomy database | Bioinformatics | Oxford Academic
If you wish to infer a tree with all genomes, consider using the de novo
workflow of GTDB-Tk. This infers a de novo phylogenetic tree using pplacer that will contain all your genomes along with the GTDB-Tk reference genomes. This is more appropriate if the goal if to use GTDB-Tk as a tree maker, but less direct and less accurate if the goal is to simply obtain a GTDB taxonomy string for each of your genomes.
https://ecogenomics.github.io/GTDBTk/commands/de_novo_wf.html
Cheers,
Donovan
Perfect. Thank you for the great response ! I understand now !
Cheers !