I want to match GTDB species to corresponding NCBI species, which metadata should I use?
I found four files contain the GTDB-to-NCBI info, “bac120_metadata_r202.tsv” “gtdb_vs_ncbi_r202_bacteria.xlsx” “ncbi_vs_gtdb_r202_bacteria.xlsx” “synonyms_bac120_r202.tsv”, which one should I use?
Hi,
You can map GTDB to NCBI taxa using the gtdb_vs_ncbi_r*.xlsx files. As you will see, the mapping is not always 1 to 1. That is, a GTDB taxon name might contains genomes with more than 1 corresponding NCBI taxon name. The most recent version of GTDB is R207.
Cheers,
Donovan
I have found that Phocaeicola salanitronis and Prevotella corporis species don’t have the corresponding ncbi species name in “gtdb_vs_ncbi_r202_bacteria.xlsx” but have matched ncbi species in “bac120_metadata_r202.tsv”. Both of species just can be cultured and can be purchased from company. Because we are preparing the bacterial experiment, it is important that we using the correct strains. I wonder to know which results should I take.
Hi. There does seem to be an issue with the gtdb_vs_ncbi_r202_bacteria.xlxs
table. This has been correct for GTDB R207. I can confirm that there are no reassignments of P. salanitornis and P. corporis in GTDB compared to NCBI. You can check this yourself using the GTDB Taxon History tool at:
https://gtdb.ecogenomic.org/taxon-history