Download the file WHO.txt1 and open it using the text editor vi. In this exercise we are going to practice the substitution functionality of vi. A command of the form

:<r1>,<r2>s/<pattern1>/<pattern2>/
    

has the effect that on all lines between line <r1> and line <r2> the first occurrence of <pattern1> is replaced by <pattern2>. In case you want all occurrences of <pattern1> to be replaced by <pattern2>, an extra g should be added to the end of the command:

:<r1>,<r2>s/<pattern1>/<pattern2>/g
    

As such, the command

:1,10s/a/A/
    

will, for example, replace the first a by A in the first ten lines of the file. In contrast, the command

:1,10s/a/A/g
    

will replace each a by A in the first ten lines of the file. The following features will also turn out to be handy in order to complete this exercise:

  • The line addresses <r1> and <r2> can also take the value $ (which stands for the number of the last line in the file) or a regular expression. If you want to replace each a by A from the first line starting with the letter b onwards, you can use the following command:

    :/^b/,$s/a/A/g
            
  • The substition command supports the use of groups, as explained on page 832 of the reference book.

  • <pattern2> may take multiple lines. Lines should be separated by \r

Assigment

The file WHO.txt2 contains lines that take the following format

<country>,<year>,<mortality rate women>,<moratlity rate men>
    

Each line contains data — originating from the World Health Organization — about the mortality rate in a given country in a certain year (2000 or 2006). The file is stored in CSV-format, meaning that fields are comma separated and the content of each field is enclosed in double quotes. The file contains additional comment lines that start with a hash (#). You are asked to compose a series of substitution commands that successively perform the following tasks in the text editor vi or vim. Try to use as few commands and as few characters per command as possible. Comment lines should not be affected by the substition commands. All changes should occur in succession To check for correctness you can use the files WHO.$$i$$.txt (1 ≤ $$i$$ ≤ 5), that contain the content generated right after the $$i$$-th command was executed.

  1. Make sure that all fields that contain no data are replaced by the text UK.UK (unknown). As such, the line corresponding to China 2006 should be replaced by "China","2006","87.0","UK.UK". (WHO.1.txt3)

  2. Convert the current format to a more readable format by removing all double quotes, replace the field separator by a space, abbreviate dates (such that 2000 is replaced by 00 and 2006 by 06) and convert floating point numbers to the notation that is regularly used in Belgium (replacing for example 448.0 by 448,0). You can make the assumption that all mortality rates contain a decimal dot. (WHO.2.txt4)

  3. Figure out the effect of the following command, and execute it on all non-comment lines. !sort -k1,1 -k2,2r (WHO.3.txt5)

  4. Reorganize the file such that it ultimately contains a single line per country. Information collected in different time periods for the same country should be displayed in succession, such that data after this conversion look like:

    <country> <moratlity women>(06)-<mortality women>(00) <mortality men>(06)-<mortality men>(00)
            

    As an example, the following lines

    Belgium 06 61,0 111,0
    Belgium 00 68,0 130,0
            

    should be converted to the single line (WHO.4.txt6)

    Belgium 61,0(06)-68,0(00) 111,0(06)-130,0(00)
            

    It is possible that this conversion takes multiple commands.
    Hint: Figure out the effect of the command :1,$j!

  5. Add an extra field for comments that should be filled with the text HMMR (high male mortality rate) for all lines that show a male mortality rate in 2000 that is higher or equal to 200. You may make the assumption here that all mortality rates are in between 0 and 1000. As such, the line

    Brazil 121,0(06)-134,0(00) 230,0(06)-248,0(00)
            

    should be converted to (WHO.5.txt7)

    Brazil 121,0(06)-134,0(00) 230,0(06)-248,0(00) HMMR
            

Guidelines for submitting a solution

Carefully follow the instructions below when submitting a solution:

  • Put your commands for the five parts of this assignment in the designated positions.

  • Refrain from making any changes to the lines already filled in in the submission area. These are used to parse the submitted solutions into separate partial solutions. You can check whether or not the parsing was successful on the feedback page.