Forward migration tool on taxonomy classifications? (e.g., r226 -> r232)

I’m sure this is not possible since new genomes have been added to later databases, but there any external tools (or functionality within GTDB-Tk) that can migrate based on representative taxa in case they are reclassified in later versions?

Alternatively, what do you recommend for situations where there is a largest genome catalog (e.g., 50k genomes) with out-dated GTDB-Tk taxonomy classifications? Is rerunning classify_wf the only real option?

1 Like

Hi. Unfortunately, re-running is the best (only?) option since new releases will have new taxa, including ~30% more species. I guess you could factor out the genomes that already have well-known species classifications that are unlikely to change between releases (e.g., E. coli, S. enterica), but this isn’t worth it since these are quickly classified using ANI. The expensive step is classifying novel genomes that need to be placed in the reference tree and the results for these are likely to change from release to release.

Cheers,
Donovan