Downloading the genomes of all Staphylococcus saprophyticus available in GTDB


I would like to download all Staphylococcus saprophyticus genomes locally, is there a way to do this in bulk rather than clicking through and downloading one by one or downloading all genomes from the [
gtdb_proteins_nt_reps_r214.tar.gz] file. (I have a list of all of the GenBank accessions).

Thanks in advance

We do not store the NCBI genomes locally so,to download all the genomes of interest, you will need to get the data from NCBI.
On the advanced search page you can search for all Staphylococcus saprophyticus in GTDB and download them using a custom made shell script.

To do so just click on the genomes icon in the result page of the search


1 Like

Hey there, Jake,

Before there was the handy method Pierre noted above, I wrote some scripts to do this sort of thing solely at the command line if it’d help in the future or if you’re interested. They’re packaged with my bit toolkit, so installing that and getting what you wanted above could be done like so:

## conda/mamba install and environment activation
mamba create -y -n bit -c astrobiomike -c conda-forge -c bioconda -c defaults bit

conda activate bit

## searching GTDB for a specific taxon and getting the corresponding NCBI accessions
bit-get-accessions-from-GTDB -t "Staphylococcus saprophyticus"

## and downloading those accessions from NCBI
bit-dl-ncbi-assemblies -w GTDB-Staphylococcus-saprophyticus-species-accs.txt -f fasta -j 4

Cheers :slight_smile:

1 Like