GTDB Forum

Integration of genomes not submitted to NCBI


Genome sequences of type strains Faecalibacterium butyricigenerans DSM 103434 and Faecalibacterium longum DSM 103432 were submitted to the China National GeneBank DataBase but not on the NCBI (doi: 10.1038/s41598-021-90786-3)
I am afraid these new species names will not be included in the next GTDB realease. Could you add the CNGBdb as an alternative data source?

In the same way, sequencing data corresponding to the type strain Flintibacter butyricus DSM27579 (ERS1032661) were submitted to NCBI but no assembly is available. Could you add this genome to GTDB?


Hello Florian,

We do not currently have the resources to integrate the China National GeneBank DataBase into the GTDB. We selected the GenBank assembly database as it is part of the INSDC. Can these two Faecalibacterium genomes be submitted to an INSDC repository (DDBJ, EMBL-EBI, or NCBI)?

We also don’t have the resources to handle sequencing data outside of the INSDC genome databases. As such, someone in the community will need to assemble the F. butyricus sequence data and submit the genome to an INSDC genome repository before it will appear in the GTDB.

Sorry we can’t be of direct help, but resource limitations and the need for data management and provenance tracking makes it challenging to integrate new genome repositories or to handle unassembled data.


Thanks Donovan. I will ask corresponding authors to submit the genomes to the INSDC.