Drop links or images here to add them to the editor.

DNA sequencing.

★★★

3-letter sequences encode amino acids in DNA. For example, TTT is phenylalanine and TTA is leucine. This program reads a DNA sequence stored in a file and outputs the number of a particular amino acid in the sequence requested by the user. E.g. in the sequence: ACGTTTGTATTT the sequence TTT appears twice.

Amino acids

Make

Write a program that asks the user to enter three characters and outputs how many times that sequence of characters appears in a file.

Success Criteria

Remember to add a comment before a subprogram, selection or iteration statement to explain its purpose.

Complete the subprogram called get_amino_acid that:

  1. Asks the user to input an amino acid.
  2. Validates the input to ensure only three letters of ACG or T are accepted.
  3. Returns the valid choice.

Complete the subprogram called check_sequence that:

  1. Takes the amino acid as a parameter.
  2. Opens the file called, dna.txt for reading.
  3. If the file cannot be found it returns -1.
  4. Reads the file a line at a time.
  5. Counts the number of amino acids found in the file.
  6. Closes the file.
  7. Returns the number of amino acids of the given sequence found.

Complete the main program so that:

  1. Calls get_amino_acid to input a valid amino acid.
  2. Calls check_sequence to return the number of the amino acids in the file.
  3. If the number is -1, the message, “DNA file not found.” is output.
  4. If the number is >-1 the number of amino acids is output using the format shown below.

Typical inputs and outputs from the program would be:

dna.txt file:

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCGG
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAG
Enter the amino acid to find:
CCC
There are 4 CCC amino acids in the DNA sequence.
Enter the amino acid to find:
GGT
There are 0 GGT amino acids in the DNA sequence.
Enter the amino acid to find:
GGG
There are 3 GGG amino acids in the DNA sequence.
🆘 If you're really stuck, use this Parsons code sorting exercise
get_amino_acid
// Input the amino acid
static string get_amino_acid()
{
---
    string choice = "";
    bool valid = false;
---
    // Validation
    while (!valid)
    {
---
        Console.WriteLine("Enter the amino acid to find:");
        choice = Console.ReadLine();
---
        // Amino acid must be 3 letters
        if (choice.Length != 3)
        {
---
            valid = false;
---
        }
---
        else
        {
---
            valid = true;
            // Check each letter of the choice
            foreach (char letter in choice)
            {
---
                // Amino acid must contain only the letters ACGT
                if (!"ACGT".Contains(letter))
                {
---
                    valid = false;
---
                }
---
            }
---
        }
---
    }
---
    return choice;
}
check_sequence
// Read the DNA sequence file
static int check_sequence(string amino_acid)
{
---
    string file_name = "dna.txt";
    // Check file exists
    try
    {
---
        int count = 0;
        StreamReader file = new StreamReader(file_name);
        string line;
---
        // Read each line
        while ((line = file.ReadLine()) != null)
        {
---
            line = line.Trim();
            // Consider data letters in threes
            for (int index = 0; index <= line.Length - 3; index += 3)
            {
---
                string sequence = line.Substring(index, 3);
                // Add to the count if amino acid found
                if (sequence == amino_acid)
                {
---
                    count++;
---
                }
---
            }
---
        }
---
        file.Close();
        return count;
---
    }
---
    catch (FileNotFoundException)
    {
---
        return -1; // File not found
---
    }
---
}
Main program
// -------------------------
// Main program
// -------------------------
public static void Main(string[] args)
{
---
    string amino_acid = get_amino_acid();
    int number = check_sequence(amino_acid);
---
    // If -1 is returned, the file does not exist
    if (number == -1)
    {
---
        Console.WriteLine("DNA file not found.");
---
    }
---
    else
    {
---
        Console.WriteLine($"There are {number} {amino_acid} amino acids in the DNA sequence.");
---
    }
---
}