Hi,
I would to like to assess whether a genome is a MAG or if it is derived from a cultured isolate.
It seems that this information is not explicitely written in bac120_metadata_r95.tsv and ar122_metadata_r95.tar.gz. However, it appears on the GTDB website.
This information can be deduced from the field “ncbi_isolation_source” but this is not convenient.
The “ncbi_genome_category” field indicates if a genome is assembled from a single cell, metagenome, or environmental sample. This is taken directly from NCBI. Lack of information here implies the genome is an isolate. As you might imagine, this information isn’t always known so there may be some SAGs or MAGs that are missing the metadata field, but I have found it to be generally reliable.
Thanks. Have you performed any manual corrections to the “ncbi_genome_category” field?
I have noticed that the field “ncbi_isolation_source” sometimes provides additional information.
For instance, it helps to distinguish genomes derived from enriched cultures or from pure cultures.
RS_GCF_000404225.1
ncbi_genome_category = none
ncbi_isolation_source = “enrichment culture from feces”
But sometimes, it not mentionned on any field and one has to refer to the original paper.
RS_GCF_000404225.1
ncbi_genome_category = none
ncbi_isolation_source = none
We have not performed any corrections. If you can find clear errors, it would be appreciated if you could bring them to the attention of NCBI. I think this is the best way to help the community as a whole.