thank you for the development and maintenance of awesome software.
I am a beginner bioinformatician and I am tackling my first phylogenetic tree analysis task.
I wonder, what do the numbers in quoted internal node names mean?
Ex. ((RS_GCF_000199675.1:0.138452,RS_GCF_001050195.2:0.122911)'0.999:g__Anaerolinea':0.044629
They’re not recognized by most of soft as branch lengths, only as labels. What do they mean then?
Just to follow up on Donovan’s comment - one issue is that a newick file allows different ways to store the bootstrap values and internal node names.
For example,
the ARB software environment does it like this:
'0.999:g__Anaerolinea':0.044629
whereas, the online tree display tool iTOL has opted for this format:
A tree with internal node IDs:
(A:0.1,(B:0.1,C:0.1)INT1:0.1[90])INT2:0.3[98]);
A, B, C : leaf names
INT1, INT2 : internal node IDs
0.1, 0.3 : branch lengths
90,98 : bootstrap values```
Cheers,
Chris
The next version of GTDB-Tk (should be released next week) will have a helper method to convert the default GTDB-Tk (ARB-style) Newick trees into the format required by iTOL.