Dear GTDB team,
I understand different marker genes were used for bacteria and archaea and 2 separate trees were provided. My question is how can I have just one single tree including both kingdom? It makes a lot of senses because at the very beginning of phylogeny research based on 16S genes, they are in the same tree and this provided many opportunities to study community level differences considering phylogeny. Is it possible to do that for newest version v220 and 226 (similar to version 207 or before, in which a single tree was provided).-Jianshu
Hi,
As you indicated, the bacterial and archaeal trees are inferred with different marker sets which necessitates having two trees. There are (at least) two options to creating a single tree:
i) Use a marker set that is suitable for both Bacteria and Archaea and infer a single tree using this marker set
ii) Create a pseudo-single tree by forcing the GTDB bacterial and archaeal trees into a single tree. To do this, you need to decide how you want each of these trees to be rooted. You can then take the Newick strings and combine them together.
Cheers,
Donovan
Here is an example of approach ii:
Tree A: (A,B);
Tree B: (C,D);
Result: ((A,B),(C,D));
This is simple string concatenation, wrapped in another set of parentheses and with a semicolon.
Hi @donovan.parks, thanks for the suggestion. Will the 120 genes used default for bacteria work for archaea? Or I guess my question is what is the logic behind to use 2 different set of genes -Jianshu
Hi. Some of the 120 bacterial genes are also found in Archaea, but not all of them. The reason we have two marker sets is that there are plenty of phylogenetically informative genes that are domain specific.