MSclassifR: an R Package for Supervised Classification of Mass Spectra with Machine Learning Methods
This package provides R functions to classify mass spectra in known categories, and to determine discriminant mass-to-charge values. It was developed with the aim of identifying very similar species or phenotypes of bacteria from mass spectra obtained by Matrix Assisted Laser Desorption Ionisation - Time Of Flight Mass Spectrometry (MALDI-TOF MS). However, the different functions of this package can also be used to classify other categories associated to mass spectra; or from mass spectra obtained with other mass spectrometry techniques. It includes easy-to-use functions for pre-processing mass spectra, functions to determine discriminant mass-to-charge values (m/z) from a library of mass spectra corresponding to different categories, and functions to predict the category (species, phenotypes, etc.) associated to a mass spectrum from a list of selected mass-to-charge values. If you use this package in your research, please cite the associated publication available here.
The installation of the MSclassifR package requires the installation of packages from Bioconductor, so you might have to install the latest version of the BiocManager package. The MSclassifR package imports the other necessary packages from the CRAN. In addition, it is recommended to install the latest version of R.
## install BiocManager if not installed
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
## Install the mixOmics and multtest packages from Bioconductor
BiocManager::install(c("multtest","mixOmics", "limma", "qvalue", "cp4p"))
## Install MSclassifR package
install.packages("MSclassifR")
## Check after install the `MSclassifR` package:
require(MSclassifR) ## For spectral easy signal processing and machine learning
-
Understanding the basics: a brief introduction concerning machine learning
-
Exploring MSclassifR: a comprehensive overview of a user-friendly tool for mass spectrometry classification
-
Hands-on examples: three vignettes illustrating how to use the functions of this package from real data sets are also available online to help users:
Users typically begin by importing their MALDI-TOF mass spectra into R using the function "MALDIquantForeign::importBrukerFlex(YourPathway)". They then create a dataframe that categorizes each mass spectrum by strain and/or species (see the 'Ecrobia' and 'Klebsiella' vignettes for examples). These categorical assignments are generally determined through Whole Genome Sequencing (WGS) or by evaluating phenotypic antimicrobial sensitivity profiles, depending on the problematic.
Once users have prepared a dataset with these categorical assignments, they can train classification models using their dataset. These models can then be saved and subsequently used to predict categories for new MALDI-TOF mass spectra (avoiding the use of WGS or phenotypic antimicrobial sensitivity profiles).
-
Delve deeper into MSclassifR: article
-
Code and reproducibility resources:
- codes used for the numerical experiments related to the MSclassifR article are publicly available and versioned at this link. Users can notably adapt the
gs2()andeG()functions of the Run_experiments.R to their problematic. - the computing environment is specified using a sessionInfo() output to ensure reproducibility.
- input datasets, when not proprietary, are included or clearly referenced in the repository.
- codes used for the numerical experiments related to the MSclassifR article are publicly available and versioned at this link. Users can notably adapt the
-
If you use MSclassifR or any of the code/workflows from this repository, please cite the following article:
Godmer, A., Benzerara, Y., Varon, E., Veziris, N., Druart, K., Mozet, R., Matondo, M., Aubry, A., & Giai Gianetto, Q. (2025). MSclassifR: An R package for supervised classification of mass spectra with machine learning methods. Expert Systems with Applications, 294, 128796. https://doi.org/10.1016/j.eswa.2025.128796


