We are going to use the HistData package. If it is not installed you can install it like this:

install.packages("HistData")

Load the height data set and create a vector x with just the male heights used in Galton’s data on the heights of parents and their children from his historic research on heredity.

library(HistData)
data(Galton)
x <- Galton$child

1. Compute the average, SD, the median and the median absolute deviation of the data. Store these values in average_q1, SD_q1, median_q1 and mad_q1 respectively.

2. Suppose Galton made a mistake when entering the first value and forgot to use the decimal point. You can imitate this error by typing:

x_with_error <- x
x_with_error[1] <- x_with_error[1]*10

Compute the average, SD, median and the median absolute deviation again and observe the differences. How much do these values grow after the mistake? Store the difference in average_diff, SD_diff, median_diff and mad_diffrespectively.

3. How could you use exploratory data analysis to detect that an error was made? Store the correct answer in Question3.

  1. Since it is only one value out of many, we will not be able to detect this.
  2. We would see an obvious shift in the distribution.
  3. A boxplot, histogram, or qq-plot would reveal a clear outlier.
  4. A scatterplot would show high levels of measurement error.