Error in GTDB-Tk v2.4.0, gtdbtk classify_wf ; ERROR: Error generating Mash sketch:

Hi, Please help me to resolve the following error :

[2024-06-25 15:03:32] INFO: GTDB-Tk v2.4.0
[2024-06-25 15:03:32] INFO: gtdbtk classify_wf --genome_dir /Volumes/Ext.HD-NRLC_Old/genomes --out_dir /Volumes/Ext.HD-NRLC_Old/gtdbtk_output_new --mash_db /Volumes/Ext.HD-NRLC_Old/gtdbtk_mash_sketch.msh --extension fasta --cpus 4 --tmpdir /Volumes/Ext.HD-NRLC_Old/tmp
[2024-06-25 15:03:32] INFO: Using GTDB-Tk reference data version r220: /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220
[2024-06-25 15:03:32] INFO: Loading reference genomes.
[2024-06-25 15:03:32] INFO: Using Mash version 2.3
[2024-06-25 15:03:32] INFO: Loading data from existing Mash sketch file: /Volumes/Ext.HD-NRLC_Old/gtdbtk_output_new/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2024-06-25 15:03:32] INFO: Creating Mash sketch file: /Volumes/Ext.HD-NRLC_Old/gtdbtk_mash_sketch.msh
[2024-06-25 15:53:29] INFO: Completed 113,104 genomes in 49.94 minutes (2,264.78 genomes/minute).
[2024-06-25 15:53:29] ERROR: Error generating Mash sketch:
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCA/000/008/085/GCA_000008085.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCA/000/008/885/GCA_000008885.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCA/000/009/845/GCA_000009845.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCA/000/010/565/GCA_000010565.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCA/000/011/445/GCA_000011445.1_genomic.fna.gz…



(omitting the processes)



Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCF/963/378/075/GCF_963378075.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCF/963/378/095/GCF_963378095.1_genomic.fna.gz…
Sketching /Volumes/Ext.HD-NRLC_Old/gtdbtk_data/release220/skani/database/GCF/963/378/105/GCF_963378105.1_genomic.fna.gz…
Writing to /Volumes/Ext.HD-NRLC_Old/gtdbtk_mash_sketch.msh…
libc++abi: terminating with uncaught exception of type kj::ExceptionImpl: kj/io.c++:405: failed: ::writev(fd, current, iov.end() - current): Invalid argument; fd = 3
stack: 104b2c679 104b2c97a 104b03884 104af375c 104aba5aa 104af4d14 104afa9c4 104ab07a3

[2024-06-25 15:53:29] ERROR: Controlled exit resulting from an unrecoverable error or warning.

An error would have been occurred when a mash sketch file (gtdbtk_mash_sketch.msh) was being generated (kj::ExceptionImpl). I am using an external HD (1.77TB space/2TB) to keep enough space to write. Is it better to use home directory (only 396.4 GB space/1TB)? Or are there any other solutions?

Hi. The error looks to be with mash itself, a 3rd party program we use internally in GTDB-Tk. I would try writing this to your local disk. Disk space shouldn’t be an issue here, but it might be an I/O issue if mash is producing results far quicker than can be written to an external disk.

Thank you very much [donovan.parks]. I will try again to my local disk.

I tried to use GTDB-Tk v2.4.0 in my local disk instead of using an external HD.
However, same error occured again although the program seemed to run.

[2024-07-03 11:39:08] INFO: GTDB-Tk v2.4.0
[2024-07-03 11:39:08] INFO: gtdbtk classify_wf --genome_dir /Users/us009/gtdbtk_data/genomes/batch_1 --out_dir /Users/us009/gtdbtk_data/gtdbtk_output_new/batch_1 --mash_db /Users/us009/gtdbtk_data/gtdbtk_mash_sketch.msh --extension fasta --cpus 8 --tmpdir /Users/us009/gtdbtk_data/tmp
[2024-07-03 11:39:08] INFO: Using GTDB-Tk reference data version r220: /Users/us009/gtdbtk_data/release220
[2024-07-03 11:39:08] INFO: Loading reference genomes.
[2024-07-03 11:39:08] INFO: Using Mash version 2.3
[2024-07-03 11:39:08] INFO: Creating Mash sketch file: /Users/us009/gtdbtk_data/gtdbtk_output_new/batch_1/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2024-07-03 11:39:08] INFO: Completed 1 genome in 0.06 seconds (16.72 genomes/second).
[2024-07-03 11:39:08] INFO: Creating Mash sketch file: /Users/us009/gtdbtk_data/gtdbtk_mash_sketch.msh
[2024-07-03 12:01:23] INFO: Completed 113,104 genomes in 22.26 minutes (5,082.09 genomes/minute).
[2024-07-03 12:01:23] ERROR: Error generating Mash sketch:
Sketching /Users/us009/gtdbtk_data/release220/skani/database/GCA/000/008/085/GCA_000008085.1_genomic.fna.gz…
Sketching /Users/us009/gtdbtk_data/release220/skani/database/GCA/000/008/885/GCA_000008885.1_genomic.fna.gz…
Sketching /Users/us009/gtdbtk_data/release220/skani/database/GCA/000/009/845/GCA_000009845.1_genomic.fna.gz…




Sketching /Users/us009/gtdbtk_data/release220/skani/database/GCF/963/378/095/GCF_963378095.1_genomic.fna.gz…
Sketching /Users/us009/gtdbtk_data/release220/skani/database/GCF/963/378/105/GCF_963378105.1_genomic.fna.gz…
Writing to /Users/us009/gtdbtk_data/gtdbtk_mash_sketch.msh…
libc++abi: terminating due to uncaught exception of type kj::ExceptionImpl: kj/io.c++:405: failed: ::writev(fd, current, iov.end() - current): Invalid argument; fd = 3
stack: 1007a1679 1007a197a 100778884 10076875c 10072f5aa 100769d14 10076f9c4 1007257a3
[2024-07-03 12:01:24] ERROR: Controlled exit resulting from an unrecoverable error or warning.

==> Processed 0/1 genomes (0%) | | [?genome/s, ETA ?]

==> Processed 1/1 genomes (100%) |███████████████| [16.67genome/s, ETA 00:00]

==> Processed 0/113104 genomes (0%) | | [?genome/s, ETA ?]
==> Processed 12/113104 genomes (0%) | | [116.40genome/s, ETA 16:11]
==> Processed 24/113104 genomes (0%) | | [87.77genome/s, ETA 21:28]
==> Processed 34/113104 genomes (0%) | | [91.23genome/s, ETA 20:39]
==> Processed 44/113104 genomes (0%) | | [84.14genome/s, ETA 22:23]




==> Processed 113094/113104 genomes (100%) |██████████████▉| [72.50genome/s, ETA 00:00]
==> Processed 113103/113104 genomes (100%) |██████████████▉| [72.97genome/s, ETA 00:00]

==> Processed 113104/113104 genomes (100%) |███████████████| [72.97genome/s, ETA 00:00]

Now I try to figure out the situation, but I do not have any clues to solve this problem.

Hi. The issue is obvious to me. I’d confirm that you aren’t running out of disk space, including that your tmp directory is not reaching 100% capacity.