I found some genomes in which the plasmids are bigger than the chromosomes:
# how to create name.map and gtdb.files.txt
# https://github.com/shenwei356/kmcp/blob/main/docs/database.md#gtdb
$ grep plasmid name.map
GCF_000163055.2 NC_022111.1 Prevotella sp. oral taxon 299 str. F0039 plasmid unnamed, complete sequence
GCF_000292525.1 NZ_AEYF01000045.1 Rhizobium sp. CCGE 510 plasmid pRspCCGE510d Contig45, whole genome shotgun sequence
GCF_000298315.2 NZ_AEYE02000035.1 Rhizobium grahamii CCGE 502 map unlocalized plasmid pRg502b contig0035, whole genome shotgun sequence
GCF_008274585.1 NZ_VTRT01000001.1 Pedobacter sp. BS3 map unlocalized plasmid unnamed1 BS3-1_scaffold1, whole genome shotgun sequence
GCF_008274625.1 NZ_VTRU01000001.1 Chryseobacterium sp. Gsoil 183 map unlocalized plasmid unnamed1 Gsoil183-2_scaffold1, whole genome shotgun sequence
GCF_016839145.1 NZ_CP069303.1 Shinella sp. PSBB067 plasmid unnamed1, complete sequence
GCF_900637865.1 NZ_LR134418.1 Legionella adelaidensis strain NCTC12735 plasmid 9, complete sequence
GCF_900660545.1 NZ_LR214986.1 Mycoplasma cynos strain NCTC10142 plasmid 13
GCA_001190015.1 LGTG01000643.1 Candidatus Burkholderia crenata strain UZHbot9 plasmid pBCRE02, whole genome shotgun sequence
# r207
$ seqkit stats --infile-list <(grep -f <(cat name.map | grep plasmid | cut -f 1) gtdb.files.txt)
file format type num_seqs sum_len min_len avg_len max_len
gtdb/gtdb_genomes_reps_r207/GCF/000/163/055/GCF_000163055.2.fna.gz FASTA DNA 2 2,480,269 709,850 1,240,134.5 1,770,419
gtdb/gtdb_genomes_reps_r207/GCF/000/292/525/GCF_000292525.1.fna.gz FASTA DNA 142 6,916,614 507 48,708.5 923,843
gtdb/gtdb_genomes_reps_r207/GCF/000/298/315/GCF_000298315.2.fna.gz FASTA DNA 80 7,146,037 257 89,325.5 607,513
gtdb/gtdb_genomes_reps_r207/GCF/008/274/585/GCF_008274585.1.fna.gz FASTA DNA 27 4,811,172 527 178,191.6 1,310,315
gtdb/gtdb_genomes_reps_r207/GCF/008/274/625/GCF_008274625.1.fna.gz FASTA DNA 17 4,983,760 676 293,162.4 2,346,872
gtdb/gtdb_genomes_reps_r207/GCF/016/839/145/GCF_016839145.1.fna.gz FASTA DNA 4 5,774,137 108,567 1,443,534.3 4,605,385
gtdb/gtdb_genomes_reps_r207/GCF/900/637/865/GCF_900637865.1.fna.gz FASTA DNA 29 2,094,483 1,153 72,223.6 451,287
gtdb/gtdb_genomes_reps_r207/GCF/900/660/545/GCF_900660545.1.fna.gz FASTA DNA 18 1,093,147 4,380 60,730.4 986,659
gtdb/gtdb_genomes_reps_r207/GCA/001/190/015/GCA_001190015.1.fna.gz FASTA DNA 643 2,843,741 500 4,422.6 103,016
# r202
$ seqkit stats --infile-list <(grep -f <(cat name.map | grep plasmid | cut -f 1) gtdb.files.txt)
file format type num_seqs sum_len min_len avg_len max_len
gtdb/GCF/000/163/055/GCF_000163055.2.fna.gz FASTA DNA 2 2,480,269 709,850 1,240,134.5 1,770,419
gtdb/GCF/000/292/525/GCF_000292525.1.fna.gz FASTA DNA 142 6,916,614 507 48,708.5 923,843
gtdb/GCF/000/298/315/GCF_000298315.2.fna.gz FASTA DNA 80 7,146,037 257 89,325.5 607,513
gtdb/GCF/001/266/905/GCF_001266905.1.fna.gz FASTA DNA 101 3,174,715 1,109 31,432.8 170,794
gtdb/GCF/008/274/585/GCF_008274585.1.fna.gz FASTA DNA 27 4,811,172 527 178,191.6 1,310,315
gtdb/GCF/008/274/625/GCF_008274625.1.fna.gz FASTA DNA 17 4,983,760 676 293,162.4 2,346,872
gtdb/GCF/010/731/815/GCF_010731815.1.fna.gz FASTA DNA 2 6,451,210 434,050 3,225,605 6,017,160
gtdb/GCF/900/637/865/GCF_900637865.1.fna.gz FASTA DNA 29 2,094,483 1,153 72,223.6 451,287
gtdb/GCF/900/660/545/GCF_900660545.1.fna.gz FASTA DNA 18 1,093,147 4,380 60,730.4 986,659
gtdb/GCA/001/190/015/GCA_001190015.1.fna.gz FASTA DNA 643 2,843,741 500 4,422.6 103,016
Examples with only one chromosome:
$ seqkit fx2tab -l -n gtdb/gtdb_genomes_reps_r207/GCF/000/163/055/GCF_000163055.2.fna.gz
NC_022111.1 Prevotella sp. oral taxon 299 str. F0039 plasmid unnamed, complete sequence 1770419
NC_022124.1 Prevotella sp. oral taxon 299 str. F0039, complete sequence 70985
$ seqkit fx2tab -l -n gtdb/gtdb_genomes_reps_r207/GCF/016/839/145/GCF_016839145.1.fna.gz
NZ_CP069302.1 Shinella sp. PSBB067 chromosome, complete genome 375573
NZ_CP069303.1 Shinella sp. PSBB067 plasmid unnamed1, complete sequence 4605385
NZ_CP069304.1 Shinella sp. PSBB067 plasmid unnamed2, complete sequence 684612
NZ_CP069305.1 Shinella sp. PSBB067 plasmid unnamed3, complete sequence 108567
I guess it’s due to the incorrect annotation by the submission author.
It would be better to choose another well-annotated representative genome.
For GCF_010731815
, the new annotation version fixed this problem.
# r202
$ seqkit fx2tab -l -n gtdb/GCF/010/731/815/GCF_010731815.1.fna.gz
NZ_AP022592.1 Mycolicibacterium arabiense strain JCM 18538 434050
NZ_AP022593.1 Mycolicibacterium arabiense strain JCM 18538 plasmid pJCM18538, complete sequence 6017160
# r207
$ seqkit fx2tab -l -n gtdb/gtdb_genomes_reps_r207/GCF/010/731/815/GCF_010731815.2.fna.gz
NZ_AP022593.1 Mycolicibacterium arabiense strain JCM 18538 chromosome, complete genome 6017160
NZ_AP022592.1 Mycolicibacterium arabiense strain JCM 18538 plasmid pJCM18538, complete sequence 434050
PS: I filter out plasmids according to the sequence name.