QualiBact Results¶
What is QualiBact?¶
QualiBact is a set of thresholds assessing the quality of bacterial genome assemblies. We have evaluated genomes based on various metrics to help researchers identify high-quality genomes for downstream analysis. These thresholds described here are implemented in SpecCheck. Source code for this process is available at QualiBact.
Quick Links¶
- 📋 Methods - Detailed methodology and criteria
- 🦠 All Species - Complete list of analyzed species
- 📊 Summary Data - Main summary and criteria tables
Navigation¶
Use the navigation menu above to explore:
-
Methods - Technical details about the analysis pipeline
-
All species - List of all species included here, with links to species-specific overviews
- Summary page - The QC criteria and summary tables for all genera and species
Considerations for QualiBact¶
✅ General Strengths¶
- The pipeline is fully automated, generic, and can be applied to any set of genomes — including arbitrary subsets such as species, clonal complexes, or lineages.
- Quality assessment is based on multiple standard metrics (e.g. N50, number of contigs, genome size, GC%), allowing reproducible filtering.
- Species-specific thresholds can be derived from available reference genomes, and thresholds can be updated as more genomes are added.
-
Variation between species — even within a genus — supports the need for species-level cutoffs, which this approach accommodates.
-
Variation between SRA and Refseq: We have observed that Genome size and assembly length distributions differ significantly between RefSeq and SRA (i.e. ATB). The cause is unclear, but relying on RefSeq-derived thresholds alone may result in unfairly excluding valid genomes. This approach combines both datasets to ensure a more inclusive and representative set of thresholds.
⚠️ Caveats¶
- Species Definitions Depend on GTDB: This tools uses Sylph for species designation, so all GTDB-related quirks apply. E.g., Shigella spp. is included in E. coli, and there are issues separating Burkholderia mallei from Burkholderia pseudomallei and Bordetella pertussis/Bordetella parapertussis from Bordetella bronchiseptica.
- No Ground Truth Claims: This evaluation reflects what has been previously observed in available datasets. It does not attempt to define a universal "ground truth" for any species.
- Assembly-Method Specific: The metrics (e.g. N50, number of contigs) are meaningful primarily for assemblies generated with Shovill (or similar SPAdes-based pipelines). Exact thresholds will vary for long-read or alternative assemblers like SKESA. However, not using Shovill implies rejection of the Torstyverse, which is heresy.
- Long-Read Assemblies Not Explicitly Handled: These cutoffs are not designed for long-read assemblies. That said, genome size and GC content thresholds should still apply, and it's reasonable to expect long-read assemblies to exceed the quality of short-read derived thresholds — not fall below them.
- Generic vs. Specific Tradeoff: While the generic approach is broadly applicable, it may miss species-specific quality nuances or lineage-level exceptions.
Citation¶
If you use QualiBact, please cite the following:
Alikhan, NF. Species specific quality control of bacterial de novo genome assemblies using QualiBact. Available at: https://github.com/happykhan/qualibact (Accessed: [insert date]).