What would it take to adapt GTDB-Tk for protista and single-celled fungi?

I’ve been trying to dive into the abyss of microeukaryotic organisms but there’s no tool that can classify them at the same quality as GTDB-Tk. I understand gene modeling for microeukaryotes can’t be done via Prodigal which is running in the back end of GTDB-Tk I believe.

Give we had precomputed gene models and quality filtered genomes:

  1. What would it take to adapt GTDB-Tk to handle microeukaryotes?
  2. What type of database structure would we need to use for the taxonomy?
  3. Are there any caveats where this methodology wouldn’t work for microeukaryotes?

I know this isn’t in the current scope of GTDB-Tk but I’m just wondering how these methods can be adapted.

Hi jolespin,

I came across your post and see it didn’t receive a lot of attention… I am facing a similar challenge and wondered if you have gained any wisdom in the last 6 months?
Any insight you have gained, tools you could recommend etc. would be greatly appreciated…

Kind of. I developed a eukaryotic classification module (and an accompanying microeukaryotic database) in my VEBA package that you might find useful. It’s nowhere near as robust as GTDB-Tk but it would be cool if we could build a custom database via GTDB-Tk to repurpose it for eukaryotic gene modeling.

Here’s the publication: VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes | BMC Bioinformatics | Full Text

Here’s the GitHub: GitHub - jolespin/veba: A modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes

Here’s a walkthrough for the end-to-end workflow but I linked out to the eukaryotic classification: veba/end-to-end_metagenomics.md at main · jolespin/veba · GitHub

The only caveat is that it assumes you used the eukaryotic gene modeling, next on the list is to adapt it to take any eukaryotic genome. I’ll add it to my to-do list.

Let me know if you have any questions on GitHub and I can help you out.