Typing

StaphSCAN includes several typing methods, each of which can be run as a stand-alone module.

Multi Locus Sequence Typing

-m mlst

All genomes identified as Staphylococcus aureus are subject to MLST using the seven-locus typing scheme described here. When you use this module, please remember to cite Jolley et al. 2018.

A copy of the MLST alleles and ST definitions is stored in the /data directory of this module.

Warning

Due to new PubMLST policies (read this) it is no more possible to redistribute data published after 31/12/2024. For this reason, all the MLST data bundled with StaphSCAN are up to that date. To update the db follow the guide in the installation section.

Parameters

Locus detection is filtered by minimum alignment identity and coverage:

--min_id_mlst : Minimum alignment percentage identity (default: 95)

--min_cov_mlst : Minimum alignment percent coverage (default: 95)

Output

Both the ST and the allelic profile are reported.

For each locus, the following annotations may be reported:

  • Exact matches: allele sequences that exactly match a known MLST allele are reported using the corresponding allele number.
  • Putative novel alleles: loci with full-length sequences that do not exactly match any known allele are reported as the closest known allele followed by a *
  • Partial loci: loci detected but not covering the full reference length are reported as Partial.
  • Missing loci: loci not detected in the assembly are reported as -.

Imprecise or incomplete allelic profiles result in approximate ST assignments. In these cases, StaphSCAN reports the closest matching ST followed by the number of differing loci (n-locus variants, up to two). Example: ST1-1LV (closest match is ST1 with one differing allele)

spa typing

-m spa

The spa-typing is a method based on the characterization of the repeat regions of Staphylococcus protein A gene (spa).This method is widely used for rapid typing of MRSA, particularly in hospital and surveillance settings.

For more information visit here.

A local copy of Ridom database is distributed with this module and stored the modules's /data directory.

Output

  • If the identified repeat patterns match a known spa type, the corresponding type is reported.

  • Patterns not present in the reference database were classified as novel and reported together with their repeat composition.

  • Assemblies in which an X-region was amplified but no known repeat units were detected were reported as “Unknown".

Limitation

Spa typing is dependent on genome assembly quality, and fragmentation or sequencing errors within the spa X-region may result in spa-negative or Unknown calls. Novel or divergent repeat patterns not present in the reference database are reported as Novel. As with all in silico typing approaches, results may differ from laboratory-based spa typing in cases of mixed populations or incomplete assemblies.

Accessory Gene Regulator (agr) Typing

-m agr

The agr module identifies the Staphylococcus aureus accessory gene regulator (agr) type, a quorum-sensing system involved in virulence regulation and commonly classified into four major groups (I–IV) (Raghuram V et al. 2022).

It is an adaption of the tool agrVATE.

A curated set of reference sequences is bundled with the module and stored in the /data directory.

This module tries to

1) Identify agr type:

The assembly is queried against group-specific probes in targets.fasta file. The agr type is assigned to the group with the highest count of unique matching probes.

2) Evaluate operon functionality:

The full agr operon is extracted from the assembly using the specific reference sequence (.gbk) for the identified group

Output

Agr group identifiers are mapped to standard agr types as follows:

Internal ID Reported agr type
gp1 agr I
gp2 agr II
gp3 agr III
gp4 agr IV

The agr module reports:

Field Description
agr_type Assigned agr group (agr I–IV)
agr_ confidence N of probes matching the assigned group
agr_frameshifts Report eventual gene defections
agr_operon_status Operon functionality assessment

The following criteria are used to report the agr_operon-status:

  • Intact: agrC and agrA coding sequences are complete and functional.

  • Pseudogene: Frameshifts, or premature stop codons detected in agrC or agrA.

  • Assembly Gap: The gene appears truncated but ends precisely at the edge of a contig. This suggests the gene might be intact but wasn't fully assembled, rather than a biological mutation.

  • Missing/Fragmented: The operon could not be extracted or is too fragmented to analyze.

  • Ref Missing: Reference data for the identified group is unavailable.

Notes and limitations

  • While the module attempts to distinguish assembly breaks (Assembly Gap) from true mutations (Pseudogene), highly fragmented assemblies may still result in ambiguous functionality calls. For this reason, positive Pseudogene results, should be treated with caution, and investigate properly using a read-based method (i.e. Snippy)

  • Typing uses relaxed BLAST parameters (90% identity) to correctly classify divergent lineages, but relies on a strict count of unique probes to ensure specificity.

  • The module is tuned to ignore natural allelic variation (SNPs) while catching structural defects (indels/stops) that destroy protein function.

Capsule Typing

-m capsule

The capsule module identifies the Staphylococcus aureus capsular polysaccharide operon and assigns the predominant capsule serotype (Type 5 or Type 8). Capsular polysaccharides are major virulence determinants involved in immune evasion and are encoded by the cap operon (capA–P), with serotype specificity driven by the H–K loci (Cocchiaro et al. 2006).

A curated FASTA file containing representative capsule gene sequences is bundled with the module and stored in the /data directory.

Parameters

Hits are filtered by minimum alignment identity and coverage:

--min_id_capsule : Minimum alignment percentage identity (default: 90)

--min_cov_capsule: Minimum alignment percentage coverage (default: 80)

Output

Capsule serotype is inferred based on the presence of serotype-specific loci:

  • Type 5: cap5H, cap5I, cap5J, cap5K
  • Type 8: cap8H, cap8I, cap8J, cap8K

Once a serotype is assigned, operon completeness is evaluated by checking for the presence of all expected genes.

The operon is classified as:

  • Complete : all genes detected
  • Incomplete : at least one gene missing
Field Description
cap_type Assigned capsule serotype (Type 5, Type 8, or -)
cap_completeness Capsule operon status (Complete, Incomplete, or -)
cap_genes Semicolon-separated list of detected capsule genes

SCCmec Typing

-m sccmec

The sccmec module detects and classifies Staphylococcus aureus SCCmec elements, which carry methicillin resistance determinants and are defined by combinations of the mec gene complex and ccr recombinase genes.

It is adapted from the tool sccmec.

A curated FASTA file containing representative SCCmec-associated target genes is bundled with the module and stored in the /data directory.

The module supports the classification of the following types:

Type Reference
I Katayama et al. 2000
II Katayama et al. 2000, Ito et al. 2001
III Katayama et al. 2000
IV Ma et al. 2002
V Ito et al. 2004
VI Oliveira et al. 2006
VII Berglund et al. 2008
VIII Zhang et al. 2009
IX Li et al. 2011
X Li et al. 2011
XI García-Álvarez et al. 2011
XII Wu et al. 2015
XIII Baig et al. 2018
XIV Urushibara et al. 2020
XV Wang et al. 2022

And of the following subtype:

SubType Reference
Ia Ito et al. 2001
Ib Han et al. 2009, Oliveira et.al. 2006
IIa Katayama et al. 2000, Ito et al. 2001
IIb Hisata et al. 2005
IIc Shore et al. 2005
IId Kondp et al. 2007
IIe Han et al. 2009
IVa Ma et al. 2002
IVb Ma et al. 2002
IVc Ma et al. 2006
IVd Ma et al. 2006
IVg Kwon et al. 2005
IVh Milheirico et al. 2007
IVi Berglund et al. 2009
IVj Berglund et al. 2009
IVk -
IVl Iwao et al. 2012
IVm Hosoya et al. 2014
IVn -
Va Ito et al. 2004
Vb Hisata et al. 2011
Vc Li et al. 2011

Parameters

Hits are filtered based on the following parameters:

  • Minimum alignment percentage of 90
  • Minimum coverage percentage of 80

Output

The mec gene complex is classified as follows:

mec class Required components
A mecI + mecR1 + mecA
B IS1272 + mecA
C IS431 + mecA
Unknown mecA or mecC present but incomplete
None No mec genes detected

Both mecA and mecC are supported.

Detected recombinase complexes include:

ccr complex Required genes
1 ccrA1 + ccrB1
2 ccrA2 + ccrB2
3 ccrA3 + ccrB3
4 ccrA4 + ccrB4
C1 ccrC1
C2 ccrC2
A1B6 ccrA1 + ccrB6
A1B3 ccrA1 + ccrB3

SCCmec types are inferred by combining the detected mec class and ccr complex(es), following established nomenclature where possible:

mec class ccr complex Assigned type
B 1 Type I (1B)
A 2 Type II (2A)
B 2 Type IV (2B)
A 3 Type III (3A)
B 4 Type VI (4B)
A 4 Type VIII (4A)
C C1 (5) Type V (5C)
C + IS12960D C1 (5) Type VII (5C + IS12960D)
C 1 Type IX (1C)
C/B A1B6 (7) Type X (A1B6)
A/E A1B3 (8) Type XI (mecC-associated)
C C2 (9) Type XII (9C)
A C2 (9) Type XIII (9A)
A C1 (5) Type XIV (5A)
A A1B6 (7) Type XV (A1B6)

If multiple compatible SCCmec types are detected, a Composite SCCmec assignment is reported.

If mec genes are detected but no ccr genes are found, the element is reported as an "orphan" cassette.

The module reports:

Field Description
sccmec_type Assigned SCCmec type
sccmec_subtype Assigned subtype
sccmec_genes Semicolon-separated list of detected SCCmec-associated genes