ValueError: 'gtdb_taxonomy' is not in list when metadata file lacks header

Hi, GTDB team!
I’m encountering an error while running the gtdb_to_ncbi_majority_vote.py script.

Environment

CentOS Linux 7 (Core)
gtdbtk-2.4.1

Description:

When running
gtdb_to_ncbi_majority_vote.py --gtdbtk_output_dir ./GTDBTK_dRep_2.4.1/ --output_file ./GTDBTK_2.4.1_NCBI.txt --bac120_metadata_file ./database/bac120_taxonomy.tsv --ar53_metadata_file ./database/ar53_taxonomy.tsv
ValueError:` ‘gtdb_taxonomy’ is not in list, just like screen shot mention below.

What i guess
The script assumes that the metadata files (e.g., bac120_metadata.tsv, ar53_metadata.tsv) contain a header row with specific column names, including ‘gtdb_taxonomy’. However, some GTDB-provided metadata files (or user-generated ones) do not include headers, causing the script to misinterpret the first data row as a header. As a result, header.index(‘gtdb_taxonomy’) raises a ValueError.

Hello,

The input files must be the GTDB archaeal and bacterial metadata files (not the taxonomy files). These have the required header. You can find these files here for GTDB r226:

Cheers,
Donovan

1 Like

@donovan.parks You’re absolutely right — that was my oversight. The truth turned out to be so simple.
Thank you for your help, and many thanks to the GTDB team for developing such a useful tool!
Cheers,
Rianjie