Due to the lack of reference eukaryotic genomes, GTDB-tk tends to classify eukaryotic sequences as new prokaryotic phyla or as divergent high-rank taxa within archaea or bacteria. Several publications have this problem and can spread this type of misinformation. Can GTDB consider including a few reference eukaryotic genomes to avoid this?
This is concerning. Do you have some examples of eukaryotic genomes that have been classified as bacteria or archaea by Tk? We’ll begin investigating and increase marker stringency and/or include euk refs as you suggest.
This might be helpful:
I realize my message below never got out.
Yes, we do have several examples. The last concerned fungal sequences from soil, which were classified as a new clade of Korarchaeia.
We also have diatom sequences classified as new cyanobacterial taxa.
Plus a few other examples which propose new bacterial phyla for protist sequences.
Please email me (email@example.com) the examples and that will help us to build an extra euk filter into Tk.