GTDB Forum

What format are newick trees and how to extract specific node?

I’m trying to read my tree from the classify/ directory but I’m getting an error:

In [50]: tree = ete3.Tree("./classify/gtdbtk.bac120.classify.tree", format=0, quoted_node_names=True)
---------------------------------------------------------------------------
NewickError                               Traceback (most recent call last)
<ipython-input-50-913e9702b09b> in <module>
----> 1 tree = ete3.Tree("./classify/gtdbtk.bac120.classify.tree", format=0, quoted_node_names=True)

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/coretype/tree.py in __init__(self, newick, format, dist, support, name, quoted_node_names)
    208         if newick is not None:
    209             self._dist = 0.0
--> 210             read_newick(newick, root_node = self, format=format,
    211                         quoted_names=quoted_node_names)
    212

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in read_newick(newick, root_node, format, quoted_names)
    249             raise NewickError('Unexisting tree file or Malformed newick tree structure.')
    250         else:
--> 251             return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
    252
    253     else:

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in _read_newick_from_string(nw, root_node, matcher, formatcode, quoted_names)
    324                     closing_internal =  closing_internal.rstrip(";")
    325                     # read internal node data and go up one level
--> 326                     _read_node_data(closing_internal, current_parent, "internal", matcher, formatcode)
    327                     current_parent = current_parent.up
    328

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in _read_node_data(subnw, current_node, node_type, matcher, formatcode)
    428             _parse_extra_features(node, data[2])
    429     else:
--> 430         raise NewickError("Unexpected newick format '%s' " %subnw[0:50])
    431     return
    432

NewickError: Unexpected newick format 'ete3_quotref_1:0.09438'
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

I was able to successfully read using format=1 but I’m getting nodes with the name 1.0 which I don’t think is correct:

In [49]: p_node.get_common_ancestor(["MAG_883.8", "MAG_883.14"])
Out[49]: Tree node '1.0' (0x7f810794d82)

Here are the available format:
http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-and-writing-newick-trees

Hi. I’m not that familiar with ETE3. We use DendroPy to do our processing.

Awesome, I’ve been wanting to try out that package anyways. Do you use the default read arguments to read in the newick tree or are there specific arguments that need to be tweaked to read in the GTDB-Tk trees?

https://dendropy.org/schemas/newick.html

tree = dendropy.Tree.get(
path="tree.tre",
schema="newick",
label=None,
taxon_namespace=None,
collection_offset=None,
tree_offset=None,
rooting="default-unrooted",
edge_length_type=float,
suppress_edge_lengths=False,
extract_comment_metadata=True,
store_tree_weights=False,
encode_splits=False,
finish_node_fn=None,
case_sensitive_taxon_labels=False,
preserve_underscores=False,
suppress_internal_node_taxa=True,
suppress_leaf_node_taxa=False,
terminating_semicolon_required=True,
ignore_unrecognized_keyword_arguments=False,
)

Hi. You can look at the following GTDB projects to see how we read trees with DendroPy: