NCBI Pathogen Detection Assembly genome question

Hi GTDB staff,
Quick question for you… I was searching for an environmental Patescibacteria genome in GTDB, and I received an alert that it was a known surveillance genome excluded from GTDB. However, it doesn’t show up on NCBI’s Pathogen Detection organism list, and there is nothing on the NCBI Assembly page to indicate it’s part of this group. Is there an alternate place where this information might be stored that I’m missing? Thanks so much!
Best,
Emily

Hi Emily,

This does seem odd. Can you send us the NCBI accession number of the genome in question?

Thanks,
Donovan

Hi Donavan,

Here are a few assembly accessions: GCA_903994715.1, GCA_903820155.1, GCA_903820115.1.
They’re part of a group of about ~150 Moranbacterales MAGs associated with BioProject PRJEB38681 that are excluded from GTDB, but don’t show up on the list of MAGs excluded due to QA issues. From my GTDB-Tk analysis, they’re very redundant, mostly confined to the same three species.

Thanks so much for your help with this!

Best,
Emily

Hi Emily,

All the genomes you listed were excluded from GTDB as they are annotated at NCBI as being “from large multi-isolate project”. We exclude all such genomes as NCBI defines this as “The assembly is one of over 100 assemblies for multiple isolates of the same species generated by the same project. Typically, these are pathogen surveillance projects.” (Genome Notes).

I will endevour to have the information box given on the GTDB website updated so it indicates that the genome isn’t necessarily an NCBI surveillance genome, but may just be flagged as being from a large multi-isolate project.

That said, I believe the NCBI annotation of these genomes is incorrect. These are not isolates, but MAGs. Can you bring this issue to the attention of NCBI? They are generally appreciative of such feedback.

Cheers,
Donovan

Hi Donavan,

Thanks so much for the info - that makes sense! I’ve sent a quick email to NCBI alerting them to the issue.

Best,
Emily