GTDB Forum

FASTANI database error in GTDB version 1-5-0

Hi there,

I tried to run gtdbtk classify_wf for 300 genomes, but have the error as below:
2021-05-28 08:04:40] INFO: gtdbtk classify_wf --genome_dir /disk/rdisk09/jin/dereplicated_genomes/ --extension fasta --out_dir gtdbtk_AD --cpus 32
[2021-05-28 08:04:40] INFO: Using GTDB-Tk reference data version r202: /disk/rdisk09/jinjinyu2/Conda/envs/gtdbtk/share/gtdbtk-1.5.0/db/
[2021-05-28 08:04:44] INFO: Identifying markers in 297 genomes with 32 threads.
[2021-05-28 08:04:47] TASK: Running Prodigal V2.6.3 to identify genes.
[2021-05-28 08:21:17] INFO: Completed 297 genomes in 16.39 minutes (18.12 genomes/minute).
[2021-05-28 08:22:10] TASK: Identifying TIGRFAM protein families.
[2021-05-28 08:32:00] INFO: Completed 297 genomes in 9.73 minutes (30.54 genomes/minute).
[2021-05-28 08:32:00] TASK: Identifying Pfam protein families.
[2021-05-28 08:33:22] INFO: Completed 297 genomes in 1.36 minutes (219.02 genomes/minute).
[2021-05-28 08:33:22] INFO: Annotations done using HMMER 3.1b2 (February 2015).
[2021-05-28 08:33:22] TASK: Summarising identified marker genes.
[2021-05-28 08:34:29] INFO: Completed 297 genomes in 1.12 minutes (265.70 genomes/minute).
[2021-05-28 08:34:30] INFO: Done.
[2021-05-28 08:34:53] INFO: Aligning markers in 297 genomes with 32 CPUs.
[2021-05-28 08:34:53] INFO: Processing 282 genomes identified as bacterial.
[2021-05-28 08:38:01] INFO: Read concatenated alignment for 45,555 GTDB genomes.
[2021-05-28 08:38:01] TASK: Generating concatenated alignment for each marker.
[2021-05-28 08:39:44] INFO: Completed 282 genomes in 1.66 minutes (169.75 genomes/minute).
[2021-05-28 08:39:45] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015).
[2021-05-28 08:41:44] INFO: Completed 120 markers in 1.96 minutes (61.19 markers/minute).
[2021-05-28 08:41:44] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
[2021-05-28 08:42:53] INFO: Completed 45,837 sequences in 1.13 minutes (40,482.54 sequences/minute).
[2021-05-28 08:42:53] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
[2021-05-28 08:42:53] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2021-05-28 08:42:53] INFO: Creating concatenated alignment for 45,837 bacterial GTDB and user genomes.
[2021-05-28 08:43:20] INFO: Creating concatenated alignment for 282 bacterial user genomes.
[2021-05-28 08:43:20] INFO: Processing 15 genomes identified as archaeal.
[2021-05-28 08:43:28] INFO: Read concatenated alignment for 2,339 GTDB genomes.
[2021-05-28 08:43:28] TASK: Generating concatenated alignment for each marker.
[2021-05-28 08:47:07] INFO: Completed 15 genomes in 3.63 minutes (4.13 genomes/minute).
[2021-05-28 08:47:07] TASK: Aligning 122 identified markers using hmmalign 3.1b2 (February 2015).
[2021-05-28 08:48:57] INFO: Completed 122 markers in 1.81 minutes (67.46 markers/minute).
[2021-05-28 08:48:57] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask.
[2021-05-28 08:49:00] INFO: Completed 2,354 sequences in 3.02 seconds (780.22 sequences/second).
[2021-05-28 08:49:00] INFO: Masked archaeal alignment from 32,754 to 5,124 AAs.
[2021-05-28 08:49:00] INFO: 0 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA.
[2021-05-28 08:49:00] INFO: Creating concatenated alignment for 2,354 archaeal GTDB and user genomes.
[2021-05-28 08:49:02] INFO: Creating concatenated alignment for 15 archaeal user genomes.
[2021-05-28 08:49:02] INFO: Done.
[2021-05-28 08:49:04] TASK: Placing 15 archaeal genomes into reference tree with pplacer using 32 CPUs (be patient).
[2021-05-28 08:49:04] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2021-05-28 08:50:30] INFO: Calculating RED values based on reference tree.
[2021-05-28 08:50:31] TASK: Traversing tree to determine classification method.
[2021-05-28 08:50:31] INFO: Completed 15 genomes in 0.00 seconds (4,966.02 genomes/second).
[2021-05-28 08:50:36] TASK: Calculating average nucleotide identity using FastANI (v1.32).
[2021-05-28 08:51:00] INFO: Completed 244 comparisons in 23.44 seconds (10.41 comparisons/second).
[2021-05-28 08:51:00] INFO: 7 genome(s) have been classified using FastANI and pplacer.
[2021-05-28 08:51:01] TASK: Placing 282 bacterial genomes into reference tree with pplacer using 32 CPUs (be patient).
[2021-05-28 08:51:01] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2021-05-28 12:44:22] INFO: Calculating RED values based on reference tree.
[2021-05-28 12:44:33] TASK: Traversing tree to determine classification method.
[2021-05-28 12:44:34] INFO: Completed 282 genomes in 0.28 seconds (1,015.01 genomes/second).
[2021-05-28 12:46:30] ERROR: Reference genome missing from FastANI database: /disk/rdisk09/jin/Conda/envs/gtdbtk/share/gtdbtk-1.5.0/db/fastani/database/GCA/902/825/775/GCA_902825775.1_genomic.fna.gz
[2021-05-28 12:46:33] ERROR: Controlled exit resulting from an unrecoverable error or warning.

And I found the folder " /disk/rdisk09/jin/Conda/envs/gtdbtk/share/gtdbtk-1.5.0/db/fastani/database/GCA/902/825/" does not exist acturrly. Any one can help me figure out?

BTW, I can successfully run the "gtdbtk classify_wf "for three genomes without any error report using gtdb-tk v1.5.0

Thanks,
Jin

Resolved as updated the ref datebase