Species detection & contig stats
-m assembly
This module will attempt to identify the species of each input assembly, and generate some basic assembly statistics.
Species identification
Species identification is performed using Mash (Ondov et al., 2016) to compare the input assembly against a database of reference genomes. The database includes the following high-quality genomes:
| Specie | Genome |
|---|---|
| Staphylococcus argenteus | GCF_000236925.1 |
| Staphylococcus aureus | GCF_000013425.1 |
| Staphylococcus capitis | GCF_040739365.1 |
| Staphylococcus epidermidis | GCF_006094375.1 |
| Staphylococcus haemolyticus | GCF_006094395.1 |
| Staphylococcus lugdunensis | GCF_001558775.1 |
| Staphylococcus schweitzeri | GCF_900636685.1 |
Currently, it uses a mash distance threshold of 0.04.
To report the species, the following criteria are used:
- The species with the lowest distance is reported, if lower than the threshold.
- If the lowest Mash distance is > 0.04, the result is reported as “No match found”.
Note
All genomes identified as not Staphylococcus aureus are skipped for downstream analysis.
Assembly stats
For assembly quality the following parameters are considered:
- Total assembly size (compared to expected size for Staphylococcus aureus of 2.6 - 3.1 Mbp)
- N50 (>=10 kbp)
- Presence of ambiguous bases (Ns)
Outputs
The assembly module generates the following output columns in the report:
| Field | Description |
|---|---|
Species |
Detected specie |
Total_size |
Assembly size (bp) |
QC |
Overall QC status (PASS or FAILED) |
All results from failed QC checks should be treated with caution.