I am reaching out to ask a question to ask about the file
gtdb_genomes_reps_r95.tar.gz from GTDB release95.
I would like to query the taxonomy of some GTDB genomes from
gtdb_genomes_reps_r95.tar.gz in the associated bacteria and archaea taxonomy files (e.g. bac120_taxonomy_r95.tsv). I noticed that all of the names in the taxonomy files have a prefix to their associated genome name in gtdb_genomes_reps_r95.tar.gz (e.g. GB_* or RS_*).
If I removed these prefixes, will I be able to match up all the genome names from gtdb_genomes_reps_r95.tar.gz to bac120_taxonomy_r95.tsv? I assume they do match but just wanted to double-check. Also, what do the RS_* and GB_* prefixes indicate?
Thank you very much for your time and this invaluable resource.