Challenges and applications of small databases in mass spectrometry-based proteomics
data analysis
The most common statistic for assigning statistical significance to peptide detections that result from a proteomics
experiment is the false discovery rate (FDR). The FDR of a set of peptide-spectrum matches (PSMs) is typically estimated through a process called target-decoy competition, where spectra generated from a proteomics experiment are searched against a database
of target and decoy sequences. This methodology is well adapted to most proteomics experiments, where the database is large or many proteins in the database are present in the sample. However, this methodology is challenged by small databases or when few peptides
in the database are present in the sample. Recent advances, such as subset-neighbor search, aim to address to this challenge. Unfortunately, we show this approach fails for cases when the database is extremely small. In addition to discussing the unclear future
of how to approach this scenario, we give several example applications that could utilize future solutions.