Carnegie Mellon University
February 12, 2024

Mohimani Lab Publishes Paper in Nature Biotechnology

By Adam Kohlhaas

A graph showing fast mass spectrometry dataScientists using mass spectrometry often find themselves with immense quantities of data but no means to rapidly process it. A team of researchers, including the Mohimani Lab in Carnegie Mellon University's Ray and Stephanie Lane Center for Computational Biology, hope to change that. Their paper, “Fast Mass Spectrometry Search and Clustering of Untargeted Metabolomics Data,” published in Nature Biotechnology, outlines a fundamental algorithm that can process larger quantities of data in the study of small molecules, known as metabolomics, in a fraction of the time needed for more traditional methods. 

Mass spectrometry is a powerful analytical technique that offers insight into the world of molecular identification and quantification. It involves ionizing and subsequently separating molecules based on their mass-to-charge ratios, providing scientists with a precise means to discern the composition and structure of diverse substances. Applied to a wide variety of disciplines — such as chemistry, biochemistry and environmental science — mass spectrometry enables researchers to gain insight into unknown compounds, understand the intricacies of biological molecules and monitor environmental pollutants.

Metabolomics research produces databases containing billions of mass spectra with information about where the samples were collected, the humans or animals they came from, their location in the world, and more — far more information than current software can handle. The new algorithm created by the Mohimani Lab and its partners makes data processing routines 100 to 1,000 times faster. For example, searching one of these databases on a single CPU could take up to five days with current software. But the new algorithm has cut the search time down to 30 minutes. In some ways, this algorithm can be seen as a small molecule search engine that allows researchers to quickly find information like where particular spectra can be found globally and where to find molecules represented by particular spectra. 

“The throughput of mass spectrometers is quadrupling every two to three years, and at the same time the study of small molecules is beginning to accelerate. Thus, the computational routines we are developing will have a significant impact in shaping the field and, in a figurative sense, permeate the veins of the world,” said Mihir Mongia, co-first-author of the paper who recently earned his Ph.D. in computational biology from CMU.

Although the paper focuses on processing metabolomics mass spectra, the algorithm is generalized and can be applied across a wide variety of domains and databases. It marks an important step forward in mass spectrometry data processing.

The Mohimani Lab collaborated on their research with peers in the Dorrestein Lab of the University of California San Diego and the Wang Bioinformatics Lab at the University of California Riverside.