The human body is composed of 10 billion cells, and in the core of each of these cells is a complex set of instructions, which we call the human genome. The instructions are stored in a DNA chain which has been divided into 23 pairs of chromosomes. DNA can be represented by a string containing only the letters (bases) A,C,G,T and -. The hyphen represents an unknown base. The human genome contains a total of more than 3 billion bases.

dubbele helix
double helix

Since the human genome consists of many repetitions, a DNA chain is usually stored in compressed form. When compressing a series of four or more consecutive identical characters are replaced by a code consisting of three characters: i) a hyphen (-), ii) a capital letter (A to Z) indicating 1 to 26 repetitions, and iii) the repeated character itself. Series of repetitions longer than 26 are replaced by multiple encoding, where all codes except the last one represent a repeated sequence of length 26. Each occurrence of a hyphen in the original DNA chain is encoded as a string of length 1 For example, the DNA chain ACCC-GTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA is encoded as ACCC-A-G-ET-ZA-DA.

Assignment

  1. Write a function DNAcompression that returns the compressed form of a given DNA chain as a result. The given DNA chain is to be passed to the function as parameter. The DNAcompression function should turn the given string ACCC-GTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA , into ACCC-A-G-ET-ZA-DA.

  2. Write a function DNAdecompression that returns the original DNA chain for a given compressed DNA chain. The given compressed DNA chain is to be passed to the function as parameter. For example ACCC-A-G-ET-ZA-DA should be returned as ACCC-GTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.

Example

>>> DNAcompression('ACCC-GTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA')
'ACCC-A-G-ET-ZA-DA'
>>> DNAdecompression('ACCC-A-G-ET-ZA-DA')
'ACCC-GTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'