Sanitize PSM
Sanitize PSM
Description
Analyze PSM lists and remove hits to scans for which another hit with a better score exists. In addition, hits to a scan with the same score but with contradicting peptides are discarded.
Input files
- at least 1 OMSSA results file (.csv)
Output files
- Sanitized PSM list (sanitized.csv)
- Discarded PSM list (discarded.csv)
Context
Synopsis
This script addresses the problem of ambiguous MS/MS scan identifications where multiple, different peptides are assigned to one MS/MS scan. As a result, individual peptide/spectral matches can get discarded for the following three reasons.
1. A better-scoring peptide has been identified
In the following example, two different peptides have been identified in the same MS/MS scan:
Row | Scan | Peptide | E-value |
---|---|---|---|
1 | example.1000.1000.2 | GDDLGGNAAMSVYTK | 2.6e-9 |
2 | example.1000.1000.2 | GDDLGGNAVCSVYTK | 7.2e-3 |
Here, peptide 2 gets discarded because peptide 1 has a better score and is therefore more likely to be the correct explanation for the MS/MS scan.
2. Low hit distinctiveness
In the following example, two different peptides have been identified in the same MS/MS scan. The score of peptide 2 is worse but almost as good as the score of peptide 1:
Row | Scan | Peptide | E-value |
---|---|---|---|
1 | example.1000.1000.2 | GDDLGGNAAMSVYTK | 1.30e-9 |
2 | example.1000.1000.2 | GDDLGGNAVCSVYTK | 1.31e-9 |
The hit distinctiveness threshold can be used to specify what is considered to be "almost as good" in terms of order of magnitude. A value of 2 would correspond to a factor of 102, thus requiring the score ratio P2/P1 to be ≥ 100. In this example, the score ratio is only 1.0077 and therefore, peptide 1 would be discarded because its identification is not distinctive enough. In addition, peptide 2 would be discarded according to rule #1. Consequently, no identifications for this scan would remain.
Parameters
- Hit distinctiveness threshold
-
Default: 2.0
- Treat modified residues unmodified
- If this flag is activated, peptides will be converted to upper case, thereby removing PTM information while filtering.
Default: true