Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Alexander F. Auch, Hans-Peter Klenk, Markus Göker

Abstract


DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring sequence pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org/species)

 

DOI: 10.4056/sigs. 541628


Keywords


BLAST, GBDP, genomics, MUMmer, phylogeny, species delineation, microbial taxonomy.

Full Text: HTML PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

This article doi:10.4056/sigs.541628 has been cited by 19 other articles:

Complete genome sequence of Cellulophaga lytica type strain (LIM-21T)
Pati et al.
Stand. Genomic Sci. 4(2) 221.
10.4056/sigs.1774329

Non-contiguous finished genome sequence of Bacteroides coprosuis type strain (PC139T)
Land et al.
Stand. Genomic Sci. 4(2) 233.
10.4056/sigs.1784330

Complete genome sequence of Geodermatophilus obscurus type strain (G-20T)
Ivanova et al.
Stand. Genomic Sci. 2(2) 158.
10.4056/sigs.711311

Complete genome sequence of Mahella australiensis type strain (50-1 BONT)
Sikorski et al.
Stand. Genomic Sci. 4(3) 331.
10.4056/sigs.1864526

Complete genome sequence of the hyperthermophilic chemolithoautotroph Pyrolobus fumarii type strain (1AT)
Anderson et al.
Stand. Genomic Sci. 4(3) 381.
10.4056/sigs.2014648

En route to a genome-based classification of Archaea and Bacteria?
Klenk and Göker
Systematic and Applied Microbiology 33(4) 175.
10.1016/j.syapm.2010.03.003

Complete genome sequence of the gliding, heparinolytic Pedobacter saltans type strain (113T)
Liolios et al.
Stand. Genomic Sci. 5(1) 30.
10.4056/sigs.2154937

Genome sequence of the moderately thermophilic halophile Flexistipes sinusarabici strain (MAS10T)
Lapidus et al.
Stand. Genomic Sci. 5(1) 86.
10.4056/sigs.2235024

Non-contiguous finished genome sequence and contextual data of the filamentous soil bacterium Ktedonobacter racemifer type strain (SOSP1-21T)
Chang et al.
Stand. Genomic Sci. 5(1) 97.
10.4056/sigs.2114901

Complete genome sequence of the thermophilic, hydrogen-oxidizing Bacillus tusciae type strain (T2T) and reclassification in the new genus, Kyrpidia gen. nov. as Kyrpidia tusciae comb. nov. and emendation of the family Alicyclobacillaceae da Costa and Rainey, 2010.
Klenk et al.
Stand. Genomic Sci. 5(1) 121.
10.4056/sigs.2144922

Genome-based phylogeny of dsDNA viruses by a novel alignment-free method
Gao and Luo
Gene 492(1) 309.
10.1016/j.gene.2011.11.004

Complete genome sequence of Hydrogenobacter thermophilus type strain (TK-6T)
Zeytun et al.
Stand. Genomic Sci. 4(2) 131.
10.4056/sigs.1463589

Complete genome sequence of Ignisphaera aggregans type strain (AQ1.S1T)
Göker et al.
Stand. Genomic Sci. 3(1) 66.
10.4056/sigs.1072907

Relationship of Bacillus amyloliquefaciens clades associated with strains DSM 7T and FZB42T: a proposal for Bacillus amyloliquefaciens subsp. amyloliquefaciens subsp. nov. and Bacillus amyloliquefaciens subsp. plantarum subsp. nov. based on complete genome sequence comparisons
Borriss et al.
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY 61(8) 1786.
10.1099/ijs.0.023267-0

The genome sequence of Bacillus subtilis subsp. spizizenii W23: insights into speciation within the B. subtilis complex and into the history of B. subtilis genetics
Zeigler
Microbiology 157(7) 2033.
10.1099/mic.0.048520-0

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies
Gritsenko et al.
Bioinformatics 28(11) 1429.
10.1093/bioinformatics/bts175

Complete genome sequence of Polynucleobacter necessarius subsp. asymbioticus type strain (QLW-P1DMWA-1T)
Meincke et al.
Stand. Genomic Sci. 6(1) 74.
10.4056/sigs.2395367

Complete genome sequence of Polynucleobacter necessarius subsp. asymbioticus type strain (QLW-P1DMWA-1T)
Meincke et al.
Stand. Genomic Sci. 6(1) 1.
10.4056/sigs.2445005

Complete genome sequence of the rapeseed plant-growth promoting Serratia plymuthica strain AS9
Neupane et al.
Stand. Genomic Sci. 6(1) 54.
10.4056/sigs.2595762




Acknowledgements

We would like to gratefully acknowledge the support of many members of the Genomic Standards Consortium, the broader genomic science community, and those who have indicated their willingness to serve as editors, reviewers and contributors.

Funding for SIGS is provided by a grant from the Office of the Vice President for Research and Graduate Studies at Michigan State University, the Michigan State University Foundation, and the US Department of Energy Biological and Environmental Research DE-FG02-08ER64707.

Standards in Genomic Sciences is indexed in:

Sponsors of the Genomic Standards Consortium: