DNA sequencing.
★★★3-letter sequences encode amino acids in DNA. For example, TTT is phenylalanine and TTA is leucine. This program reads a DNA sequence stored in a file and outputs the number of a particular amino acid in the sequence requested by the user. E.g. in the sequence: ACGTTTGTATTT the sequence TTT appears twice.
Write a program that asks the user to enter three characters and outputs how many times that sequence of characters appears in a file.
Remember to add a comment before a subprogram, selection or iteration statement to explain its purpose.
get_amino_acid that:check_sequence that:dna.txt for reading. Note this is included in the Trinket above for you to use as source data.get_amino_acid to input a valid amino acid.check_sequence to return the number of the amino acids in the file.dna.txt file:
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCGG
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAG
Enter the amino acid to find: CCC
There are 4 CCC amino acids in the DNA sequence.
Enter the amino acid to find: GGT
There are 0 GGT amino acids in the DNA sequence.
Enter the amino acid to find: GGG
There are 3 GGG amino acids in the DNA sequence.
get_amino_acid
# Input the amino acid
def get_amino_acid():
---
choice = ""
valid = False
---
# Validation
while not valid:
---
choice = input("Enter the amino acid to find: ")
valid = True
---
# Amino acid must be 3 letters
if len(choice) != 3:
---
valid = False
---
else:
---
# Check each letter of the choice
for letter in range(len(choice)):
---
# Amino acid must contain only the letters ACGT
if choice[letter] not in "ACGT":
---
valid = False
---
return choice
check_sequence
# Read the DNA sequence file
def check_sequence(amino_acid):
---
# Check file exists
try:
---
file = open("dna.txt", "r")
---
except FileNotFoundError:
---
return -1
---
else:
---
count = 0
---
# Read in each line
for line in file:
---
line = line.strip()
---
# Consider data letters in threes
for index in range(0, len(line), 3):
---
sequence = line[index] + line[index + 1] + line[index + 2]
---
# Add to the count if amino acid found
if sequence == amino_acid:
---
count = count + 1
---
file.close()
---
return count
# -------------------------
# Main program
# -------------------------
---
amino_acid = get_amino_acid()
---
number = check_sequence(amino_acid)
---
# If -1 is returned the file does not exist
if number == -1:
---
print("DNA file not found.")
---
else:
---
print("There are", number, amino_acid, "amino acids in the DNA sequence.")