`gtdbtk` meaning of numbers in internal node names in the output tree

Hello,

thank you for the development and maintenance of awesome software.
I am a beginner bioinformatician and I am tackling my first phylogenetic tree analysis task.
I wonder, what do the numbers in quoted internal node names mean?
Ex. ((RS_GCF_000199675.1:0.138452,RS_GCF_001050195.2:0.122911)'0.999:g__Anaerolinea':0.044629
They’re not recognized by most of soft as branch lengths, only as labels. What do they mean then?

Yours sincerely,
Valentyn

Hi Valentyn,

The 0.999 is the non-parametric bootstrap support for the node. EBI has a nice explanation of these values: Confidence | Phylogenetics.

Cheers,
Donovan

1 Like

Hi Valentyn,

Just to follow up on Donovan’s comment - one issue is that a newick file allows different ways to store the bootstrap values and internal node names.
For example,
the ARB software environment does it like this:

'0.999:g__Anaerolinea':0.044629

whereas, the online tree display tool iTOL has opted for this format:

  A tree with internal node IDs:

   (A:0.1,(B:0.1,C:0.1)INT1:0.1[90])INT2:0.3[98]);

    A, B, C    : leaf names
    INT1, INT2 : internal node IDs
    0.1, 0.3   : branch lengths
    90,98      : bootstrap values```

Cheers, 
Chris
1 Like

Hi,

The next version of GTDB-Tk (should be released next week) will have a helper method to convert the default GTDB-Tk (ARB-style) Newick trees into the format required by iTOL.

Cheers,
Donovan

1 Like