I would be interested in building a query using a list of Genbank Assembly Accession ids
(e.g., GCA_000275825.1, GCA_000250635.1) to get the corresponding entries in GTDB.
I already know that all the ids i am using are included in GTDB.
Is there a way to build such a query? If not, could I use API genome summary for multiple ids ?
Thank you very much for all your help!
Haris
Hi Haris,
What information are you after from GTDB? GTDB is not a genome repository so if you are trying to get sequencing data it is best to get this directly from NCBI. If you are after GTDB metadata, we provide TSV files will all this information:
https://data.gtdb.ecogenomic.org/releases/latest/
Specifically, the files ar122_metadata.tar.gz and bac120_metadata.tar.gz.
Cheers,
Donovan
Hi Donovan and thank you very much for your immediate response!
I am building a web platform that integrates infromation from various resources and I was wondering if I could build somehow a query as those a GTDB user can using the advance search.
Thank you very much again!
Haris
Hi Haris,
Unfortunately, the GTDB RESTful API isn’t that sophisticated at the moment so I think you’d have to place the GTDB metadata into a DB on your end to handle such queries. Or, if building RESTful APIs is you jam we’re always happy for people to contribute to the GTDB!
Cheers,
Donovan