Assess if a genome is a MAG with *_metadata_r95.tsv files

Hi,
I would to like to assess whether a genome is a MAG or if it is derived from a cultured isolate.
It seems that this information is not explicitely written in bac120_metadata_r95.tsv and ar122_metadata_r95.tar.gz. However, it appears on the GTDB website.
This information can be deduced from the field “ncbi_isolation_source” but this is not convenient.

Best,
Florian

Hi,

The “ncbi_genome_category” field indicates if a genome is assembled from a single cell, metagenome, or environmental sample. This is taken directly from NCBI. Lack of information here implies the genome is an isolate. As you might imagine, this information isn’t always known so there may be some SAGs or MAGs that are missing the metadata field, but I have found it to be generally reliable.

Cheers,
Donovan

Thanks. Have you performed any manual corrections to the “ncbi_genome_category” field?

I have noticed that the field “ncbi_isolation_source” sometimes provides additional information.
For instance, it helps to distinguish genomes derived from enriched cultures or from pure cultures.

RS_GCF_000404225.1
ncbi_genome_category = none
ncbi_isolation_source = “enrichment culture from feces”

But sometimes, it not mentionned on any field and one has to refer to the original paper.
RS_GCF_000404225.1
ncbi_genome_category = none
ncbi_isolation_source = none

Florian

Hi,

We have not performed any corrections. If you can find clear errors, it would be appreciated if you could bring them to the attention of NCBI. I think this is the best way to help the community as a whole.

Cheers,
Donovan

Hi,

How are the genomes “derived from environmental_sample” indicated in the GTDB badge system?

Are they indicated as MAG?

Best,

Ilnam

Yes - they are indicated as MAGs.