Typing
StaphSCAN includes several typing methods, each of which can be run as a stand-alone module.
Multi Locus Sequence Typing
-m mlst
All genomes identified as Staphylococcus aureus are subject to MLST using the seven-locus typing scheme described here. When you use this module, please remember to cite Jolley et al. 2018.
A copy of the MLST alleles and ST definitions is stored in the /data directory of this module.
Warning
Due to new PubMLST policies (read this) it is no more possible to redistribute data published after 31/12/2024. For this reason, all the MLST data bundled with StaphSCAN are up to that date. To update the db follow the guide in the installation section.
Parameters
Locus detection is filtered by minimum alignment identity and coverage:
--min_id_mlst : Minimum alignment percentage identity (default: 95)
--min_cov_mlst : Minimum alignment percent coverage (default: 95)
Output
Both the ST and the allelic profile are reported.
For each locus, the following annotations may be reported:
- Exact matches: allele sequences that exactly match a known MLST allele are reported using the corresponding allele number.
- Putative novel alleles: loci with full-length sequences that do not exactly match any known allele are reported as the closest known allele followed by a
* - Partial loci: loci detected but not covering the full reference length are reported as Partial.
- Missing loci: loci not detected in the assembly are reported as
-.
Imprecise or incomplete allelic profiles result in approximate ST assignments. In these cases, StaphSCAN reports the closest matching ST followed by the number of differing loci (n-locus variants, up to two). Example: ST1-1LV (closest match is ST1 with one differing allele)
spa typing
-m spa
The spa-typing is a method based on the characterization of the repeat regions of Staphylococcus protein A gene (spa).This method is widely used for rapid typing of MRSA, particularly in hospital and surveillance settings.
For more information visit here.
A local copy of Ridom database is distributed with this module and stored the modules's /data directory.
Output
-
If the identified repeat patterns match a known spa type, the corresponding type is reported.
-
Patterns not present in the reference database were classified as novel and reported together with their repeat composition.
-
Assemblies in which an X-region was amplified but no known repeat units were detected were reported as “Unknown".
Limitation
Spa typing is dependent on genome assembly quality, and fragmentation or sequencing errors within the spa X-region may result in spa-negative or Unknown calls. Novel or divergent repeat patterns not present in the reference database are reported as Novel. As with all in silico typing approaches, results may differ from laboratory-based spa typing in cases of mixed populations or incomplete assemblies.
Accessory Gene Regulator (agr) Typing
-m agr
The agr module identifies the Staphylococcus aureus accessory gene regulator (agr) type, a quorum-sensing system involved in virulence regulation and commonly classified into four major groups (I–IV) (Raghuram V et al. 2022).
It is an adaption of the tool agrVATE.
A curated set of reference sequences is bundled with the module and stored in the /data directory.
This module tries to
1) Identify agr type:
The assembly is queried against group-specific probes in targets.fasta file. The agr type is assigned to the group with the highest count of unique matching probes.
2) Evaluate operon functionality:
The full agr operon is extracted from the assembly using the specific reference sequence (.gbk) for the identified group
Output
Agr group identifiers are mapped to standard agr types as follows:
| Internal ID | Reported agr type |
|---|---|
gp1 |
agr I |
gp2 |
agr II |
gp3 |
agr III |
gp4 |
agr IV |
The agr module reports:
| Field | Description |
|---|---|
agr_type |
Assigned agr group (agr I–IV) |
agr_ confidence |
N of probes matching the assigned group |
agr_frameshifts |
Report eventual gene defections |
agr_operon_status |
Operon functionality assessment |
The following criteria are used to report the agr_operon-status:
-
Intact: agrC and agrA coding sequences are complete and functional. -
Pseudogene: Frameshifts, or premature stop codons detected in agrC or agrA. -
Assembly Gap: The gene appears truncated but ends precisely at the edge of a contig. This suggests the gene might be intact but wasn't fully assembled, rather than a biological mutation. -
Missing/Fragmented: The operon could not be extracted or is too fragmented to analyze. -
Ref Missing: Reference data for the identified group is unavailable.
Notes and limitations
-
While the module attempts to distinguish assembly breaks (
Assembly Gap) from true mutations (Pseudogene), highly fragmented assemblies may still result in ambiguous functionality calls. For this reason, positivePseudogeneresults, should be treated with caution, and investigate properly using a read-based method (i.e. Snippy) -
Typing uses relaxed BLAST parameters (90% identity) to correctly classify divergent lineages, but relies on a strict count of unique probes to ensure specificity.
-
The module is tuned to ignore natural allelic variation (SNPs) while catching structural defects (indels/stops) that destroy protein function.
Capsule Typing
-m capsule
The capsule module identifies the Staphylococcus aureus capsular polysaccharide operon and assigns the predominant capsule serotype (Type 5 or Type 8). Capsular polysaccharides are major virulence determinants involved in immune evasion and are encoded by the cap operon (capA–P), with serotype specificity driven by the H–K loci (Cocchiaro et al. 2006).
A curated FASTA file containing representative capsule gene sequences is bundled with the module and stored in the /data directory.
Parameters
Hits are filtered by minimum alignment identity and coverage:
--min_id_capsule : Minimum alignment percentage identity (default: 90)
--min_cov_capsule: Minimum alignment percentage coverage (default: 80)
Output
Capsule serotype is inferred based on the presence of serotype-specific loci:
- Type 5: cap5H, cap5I, cap5J, cap5K
- Type 8: cap8H, cap8I, cap8J, cap8K
Once a serotype is assigned, operon completeness is evaluated by checking for the presence of all expected genes.
The operon is classified as:
- Complete : all genes detected
- Incomplete : at least one gene missing
| Field | Description |
|---|---|
cap_type |
Assigned capsule serotype (Type 5, Type 8, or -) |
cap_completeness |
Capsule operon status (Complete, Incomplete, or -) |
cap_genes |
Semicolon-separated list of detected capsule genes |
SCCmec Typing
-m sccmec
The sccmec module detects and classifies Staphylococcus aureus SCCmec elements, which carry methicillin resistance determinants and are defined by combinations of the mec gene complex and ccr recombinase genes.
It is adapted from the tool sccmec.
A curated FASTA file containing representative SCCmec-associated target genes is bundled with the module and stored in the /data directory.
The module supports the classification of the following types:
| Type | Reference |
|---|---|
| I | Katayama et al. 2000 |
| II | Katayama et al. 2000, Ito et al. 2001 |
| III | Katayama et al. 2000 |
| IV | Ma et al. 2002 |
| V | Ito et al. 2004 |
| VI | Oliveira et al. 2006 |
| VII | Berglund et al. 2008 |
| VIII | Zhang et al. 2009 |
| IX | Li et al. 2011 |
| X | Li et al. 2011 |
| XI | García-Álvarez et al. 2011 |
| XII | Wu et al. 2015 |
| XIII | Baig et al. 2018 |
| XIV | Urushibara et al. 2020 |
| XV | Wang et al. 2022 |
And of the following subtype:
| SubType | Reference |
|---|---|
| Ia | Ito et al. 2001 |
| Ib | Han et al. 2009, Oliveira et.al. 2006 |
| IIa | Katayama et al. 2000, Ito et al. 2001 |
| IIb | Hisata et al. 2005 |
| IIc | Shore et al. 2005 |
| IId | Kondp et al. 2007 |
| IIe | Han et al. 2009 |
| IVa | Ma et al. 2002 |
| IVb | Ma et al. 2002 |
| IVc | Ma et al. 2006 |
| IVd | Ma et al. 2006 |
| IVg | Kwon et al. 2005 |
| IVh | Milheirico et al. 2007 |
| IVi | Berglund et al. 2009 |
| IVj | Berglund et al. 2009 |
| IVk | - |
| IVl | Iwao et al. 2012 |
| IVm | Hosoya et al. 2014 |
| IVn | - |
| Va | Ito et al. 2004 |
| Vb | Hisata et al. 2011 |
| Vc | Li et al. 2011 |
Parameters
Hits are filtered based on the following parameters:
- Minimum alignment percentage of 90
- Minimum coverage percentage of 80
Output
The mec gene complex is classified as follows:
| mec class | Required components |
|---|---|
| A | mecI + mecR1 + mecA |
| B | IS1272 + mecA |
| C | IS431 + mecA |
| Unknown | mecA or mecC present but incomplete |
| None | No mec genes detected |
Both mecA and mecC are supported.
Detected recombinase complexes include:
| ccr complex | Required genes |
|---|---|
| 1 | ccrA1 + ccrB1 |
| 2 | ccrA2 + ccrB2 |
| 3 | ccrA3 + ccrB3 |
| 4 | ccrA4 + ccrB4 |
| C1 | ccrC1 |
| C2 | ccrC2 |
| A1B6 | ccrA1 + ccrB6 |
| A1B3 | ccrA1 + ccrB3 |
SCCmec types are inferred by combining the detected mec class and ccr complex(es), following established nomenclature where possible:
| mec class | ccr complex | Assigned type |
|---|---|---|
| B | 1 | Type I (1B) |
| A | 2 | Type II (2A) |
| B | 2 | Type IV (2B) |
| A | 3 | Type III (3A) |
| B | 4 | Type VI (4B) |
| A | 4 | Type VIII (4A) |
| C | C1 (5) | Type V (5C) |
| C + IS12960D | C1 (5) | Type VII (5C + IS12960D) |
| C | 1 | Type IX (1C) |
| C/B | A1B6 (7) | Type X (A1B6) |
| A/E | A1B3 (8) | Type XI (mecC-associated) |
| C | C2 (9) | Type XII (9C) |
| A | C2 (9) | Type XIII (9A) |
| A | C1 (5) | Type XIV (5A) |
| A | A1B6 (7) | Type XV (A1B6) |
If multiple compatible SCCmec types are detected, a Composite SCCmec assignment is reported.
If mec genes are detected but no ccr genes are found, the element is reported as an "orphan" cassette.
The module reports:
| Field | Description |
|---|---|
sccmec_type |
Assigned SCCmec type |
sccmec_subtype |
Assigned subtype |
sccmec_genes |
Semicolon-separated list of detected SCCmec-associated genes |