The MASS library contains the Boston data set, which records medv (median house value) for 506 neighborhoods around Boston. We will seek to predict medv using 13 predictors such as rm (average number of rooms per house), age (average age of houses), and lstat (percent of households with low socioeconomic status).

> head(Boston)
     crim zn indus chas   nox    rm  age    
1 0.00632 18  2.31    0 0.538 6.575 65.2 
2 0.02731  0  7.07    0 0.469 6.421 78.9 
3 0.02729  0  7.07    0 0.469 7.185 61.1 
4 0.03237  0  2.18    0 0.458 6.998 45.8 
5 0.06905  0  2.18    0 0.458 7.147 54.2 
6 0.02985  0  2.18    0 0.458 6.430 58.7 
     dis rad   tax ptratio  black  lstat medv
1 4.0900   1   296    15.3 396.90  4.98  24.0
2 4.9671   2   242    17.8 396.90  9.14  21.6
3 4.9671   2   242    17.8 392.83  4.03  34.7
4 6.0622   3   222    18.7 394.63  2.94  33.4
5 6.0622   3   222    18.7 396.90  5.33  36.2
6 6.0622   3   222    18.7 394.12  5.21  28.7

> names(Boston)
[1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
[8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"

To find out more about the data set, we can also type ?Boston. (This ? feature only works because of the specific nature of this dataset, that is attached with the book. This will not work for a traditional data set.)

Try running the head() and names() function for yourself and store the output in Boston.head and Boston.names respectively.

Note: Here, we ask you to store the output of these functions in variables. In this way you will not see the output in your console. However, we encourage you to look at the output of these functions, so you get a feeling of what exactly you are doing.


Assume that: