Peptides are molecules which consist of a small chain of amino acids that are interconnected by means of a peptide bond. The Unique Peptide Finder1 application from the Unipept2 platform allows you to determine the set of unique peptides for a selected set of complete genomes. The application considers peptides to be unique if they occur in each of the selected genomes and in no other known genomes (except those belonging to the same species as the selected genomes).

Unique Peptide Finder
The Unique Peptide Finder from the Unipept suite finds almost 4000 peptides that are unique to the bacterial species Acinetobacter baumannii. These peptides can be used as biomarkers to check whether or not the species appears in an environmental sample.

These unique peptides can then be used to determine whether an environmental sample contains traces of a given species, for example to assure that there is no horse meat in the meatballs from a Swedish furniture giant3. Indeed, it is possible to tweak mass spectrometers for targeting certain peptides (targeted proteomics). To do so, additional selection criteria are usually imposed on the peptides to increase the possibility that they are picked up by the mass spectrometer. For example, with selected reaction monitoring (SRM) — a particular technique for targeted proteomics — only peptides having a length between 10 and 15 amino acids are used, and peptides containing cysteine (C), methionine (M), histidine (H) or tryptophan (W) are excluded. After all, during proteomics experiments cysteines are often reduced and alkalized to break sulfur bridges, and methionine, histidine and tryptophan have a tendency to become oxidized.

Assignment

In this assignment, peptides are represented as strings that only contain letters (in practice only 20 letters are used, corresponding to the 20 different amino acids). Your task is to implement a function filterPeptides that takes the location of two text files as string arguments. The first text file must contain a list of peptides, with each peptide on a separate line. This is the file format used by the Unique Peptide Finder4 to export lists of unique peptides. The function must write all peptides from this text file that meet certain criteria to a new text file (each peptide must be written on a separate line), whose location is passed to the function as its second argument. The criteria that are applied to filter peptides can be set using four optional parameters of the function filterPeptides.

A filter is only applied if a value is passed to the corresponding parameter of the function. All filters that compare letters should work case insensitive. The peptides that meet all criteria should be written to the new text file without any modification.

As a helper function for the implementation of the function filterPeptides, you must first write a function filterPeptide. This function takes a peptide as an argument. In addition, the function has the same optional parameters as the function filterPeptides, with the same semantics. The function must return a Boolean value that indicates whether or not the given peptide meets all criteria.

Example

In the following interactive session we assume that the text file peptides.txt5 is located in the current directory.

>>> filterPeptide('QEWLEMPWDNWPVYVLR', minlen=10, maxlen=20)
True
>>> filterPeptide('LICLSYGCHMMSYQWAHIVTDDCVDEGCGMYHMSHEILK', maxlen=20)
False
>>> filterPeptide('EQEETISFADLGPNGTFISK', contains='DEQ', lacks='CMHW')
True
>>> filterPeptide('QEWLEMPWDNWPVYVLR', contains='DEQ', minlen=10, maxlen=20, lacks='CMHW')
False

>>> filterPeptides('peptides.txt', 'filtered.txt', minlen=10, maxlen=20)
>>> print(open('filtered.txt', 'r').read().rstrip())
EQEETISFADLGPNGTFISK
QEWLEMPWDNWPVYVLR

>>> filterPeptides('peptides.txt', 'filtered.txt', maxlen=20)
>>> print(open('filtered.txt', 'r').read().rstrip())
ISIK
GLIR
EQEETISFADLGPNGTFISK
QEWLEMPWDNWPVYVLR

>>> filterPeptides('peptides.txt', 'filtered.txt', contains='DEQ', lacks='CMHW')
>>> print(open('filtered.txt', 'r').read().rstrip())
EQEETISFADLGPNGTFISK
SATIDLGIYTIADLAISGGTTDNVDGTGDAPGLGDIQEVPR

>>> filterPeptides('peptides.txt', 'filtered.txt', lacks='CMHW')
>>> print(open('filtered.txt', 'r').read().rstrip())
ISIK
GLIR
EQEETISFADLGPNGTFISK
SATIDLGIYTIADLAISGGTTDNVDGTGDAPGLGDIQEVPR

>>> filterPeptides('peptides.txt', 'filtered.txt', contains='DEQ', minlen=10, maxlen=20, lacks='CMHW')
>>> print(open('filtered.txt', 'r').read().rstrip())
EQEETISFADLGPNGTFISK

Resources