DNA sequencing.

★★★

3-letter sequences encode amino acids in DNA. For example, TTT is phenylalanine and TTA is leucine. This program reads a DNA sequence stored in a file and outputs the number of a particular amino acid in the sequence requested by the user. E.g. in the sequence: ACGTTTGTATTT the sequence TTT appears twice.

Amino acids

Make

Write a program that asks the user to enter three characters and outputs how many times that sequence of characters appears in a file.

Success Criteria

Remember to add a comment before a subprogram, selection or iteration statement to explain its purpose.

Complete the subprogram called get_amino_acid that:

  1. Asks the user to input an amino acid.
  2. Validates the input to ensure only three letters of ACG or T are accepted.
  3. Returns the valid choice.

Complete the subprogram called check_sequence that:

  1. Takes the amino acid as a parameter.
  2. Opens the file called, dna.txt for reading. Note this is included in the Trinket above for you to use as source data.
  3. If the file cannot be found it returns -1.
  4. Reads the file a line at a time.
  5. Counts the number of amino acids found in the file.
  6. Closes the file.
  7. Returns the number of amino acids of the given sequence found.

Complete the main program so that:

  1. Calls get_amino_acid to input a valid amino acid.
  2. Calls check_sequence to return the number of the amino acids in the file.
  3. If the number is -1, the message, “DNA file not found.” is output.
  4. If the number is >-1 the number of amino acids is output using the format shown below.

Typical inputs and outputs from the program would be:

dna.txt file:

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCGG
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAG
Enter the amino acid to find: CCC
There are 4 CCC amino acids in the DNA sequence.
Enter the amino acid to find: GGT
There are 0 GGT amino acids in the DNA sequence.
Enter the amino acid to find: GGG
There are 3 GGG amino acids in the DNA sequence.

Restricted automated feedback

Automated feedback for this assignment is still under construction. Submitted programs are checked for syntax errors and their source code is checked for potential errors, bugs, stylistic issues, and suspicious constructs. However, no checks are performed yet to see if the program correctly implements the behaviour specified in the assignment.

🆘 If you're really stuck, use this Parsons code sorting exercise
get_amino_acid
# Input the amino acid
def get_amino_acid():
---
    choice = ""
    valid = False
---
    # Validation
    while not valid:
---
        choice = input("Enter the amino acid to find: ")
        valid = True
---
        # Amino acid must be 3 letters
        if len(choice) != 3:
---
            valid = False
---
        else:
---
            # Check each letter of the choice
            for letter in range(len(choice)):
---
                # Amino acid must contain only the letters ACGT
                if choice[letter] not in "ACGT":
---
                    valid = False
---
    return choice
check_sequence
# Read the DNA sequence file
def check_sequence(amino_acid):
---
    # Check file exists
    try:
---
        file = open("dna.txt", "r")
---
    except FileNotFoundError:
---
        return -1
---
    else:
---
        count = 0
---
        # Read in each line
        for line in file:
---
            line = line.strip()
---
            # Consider data letters in threes
            for index in range(0, len(line), 3):
---
                sequence = line[index] + line[index + 1] + line[index + 2]
---
                # Add to the count if amino acid found
                if sequence == amino_acid:
---
                    count = count + 1
---
        file.close()
---
    return count
Main program
# -------------------------
# Main program
# -------------------------
---
amino_acid = get_amino_acid()
---
number = check_sequence(amino_acid)
---
# If -1 is returned the file does not exist
if number == -1:
---
    print("DNA file not found.")
---
else:
---
    print("There are", number, amino_acid, "amino acids in the DNA sequence.")