Despite many attempts, researchers have still not devised a scoring function that reliably assigns the highest score to the biologically correct peptide, i.e., the peptide that generated the spectrum. Fortunately, although the correct peptide often does not achieve the highest score among all peptides, it typically does score highest among all peptides limited to the species’s proteome. As a result, we can transition from peptide sequencing to peptide identification by limiting our search to peptides present in the proteome, which we concatenate into a single amino acid string Proteome.
Given a file containing a spectral vector S and an amino acid string Proteome.
Return a peptide with maximum score against S.
For masses with more than one amino acid: K/Q -> Q and I/L -> L.
>>> peptide_identification('data01.txt'1, 'QGQ') 'G' >>> peptide_identification('data02.txt'2, 'TGRDVH') 'TG'