No FASTA file paths in metadata for release202 bac120 and ar122

Hi there,

I’m trying to use the Struo2 pipeline to generate a Kraken2 database including, in part, genomes from the GTDB. The pipeline requires as input a TSV file containing genome metadata. The metadata files for bac120 and ar122 available at the following URL have all the fields needed for that TSV, except that they lack FASTA file paths linking to the genomes’ FASTA files: “GTDB Data - /releases/release202/202.0/”. Does anyone know where/how I can acquire the necessary FASTA file paths for each entry in these metadata files?

Kind regards,
Daniel Golden
Boston University
Department of Bioinformatics

Hi Daniel,

I believe you are after the assembly_summary_genbank.txt and potentially assembly_summary_genbank_historical.txt at:
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/

You can find analogous files for RefSeq genome assemblies at:
https://ftp.ncbi.nlm.nih.gov/genomes/refseq/

Cheers,
Donovan