We can use the plot() function to produce scatterplots of the quantitative variables.

plot(weight, mpg)


If the variable plotted on the x-axis is categorical, then boxplots will automatically be produced by the plot() function (remember that in Loading data 41 we turned cylinders in a qualitative variable). As usual, a number of options can be specified in order to customize the plots (col for customizing the color, varwidth to have variable width of the boxplots depending on the amount of observations in each category, and xlab and ylab for axis descriptions).

plot(cylinders, mpg, col = "red", varwidth = T, xlab = "cylinders", ylab = "MPG")


The pairs() function creates a scatterplot matrix i.e. a scatterplot for every pair of variables for any given data set.


We can also produce a scatterplot matrix for just a subset of the variables.

pairs(~mpg +
  displacement +
  horsepower +
  weight +
  acceleration, Auto)


hist(mpg, col = 2, breaks = 15)


The hist() function can be used to plot a histogram. Note that col = 2 histogram has the same effect as col = "red". breaks = 15 gives us 15 separate bins.

In conjunction with the plot() function, identify() provides a useful interactive method for identifying the value for a particular variable for points on a plot. We pass in three arguments to identify(): the x-axis variable, the y-axis variable, and the variable whose values we would like to see printed for each point. Then clicking on a given point in the plot will cause R to print the value of the variable of interest. Right-clicking on the plot will exit the identify() function (control-click on a Mac). The numbers printed under the identify() function correspond to the rows for the selected points.

plot(horsepower, mpg)
identify(horsepower, mpg, name)


The summary() function produces a numerical summary of each variable in a particular data set.

> summary(Auto)
      mpg         cylinders      displacement
Min.    : 9.00  Min .   :3.000  Min.    : 68.0
1st Qu .:17.00  1st Qu .:4.000  1st Qu .:105.0
Median  :22.75  Median  :4.000  Median  :151.0
Mean    :23.45  Mean    :5.472   Mean   :194.4
3rd Qu .:29.00  3rd Qu .:8.000  3rd Qu .:275.8
Max.    :46.60  Max .   :8.000  Max.    :455.0

   horsepower       weight       acceleration
Min.    : 46.0  Min .   :1613   Min .   : 8.00
1st Qu. : 75.0  1st Qu .:2225   1st Qu .:13.78
Median  : 93.5  Median  :2804   Median  :15.50
Mean    :104.5  Mean    :2978   Mean    :15.54
3rd Qu .:126.0  3rd Qu .:3615   3rd Qu .:17.02
Max.    :230.0  Max .   :5140   Max .   :24.80

    year            origin              name
Min.    :70.00  Min .   :1.000  amc matador         : 5
1st Qu .:73.00  1st Qu .:1.000  ford pinto          : 5
Median  :76.00  Median  :1.000  toyota corolla      : 5
Mean    :75.98  Mean    :1.577  amc gremlin         : 4
3rd Qu .:79.00  3rd Qu .:2.000  amc hornet          : 4
Max.    :82.00  Max .   :3.000  chevrolet chevette  : 4
                                (Other) :365

For qualitative variables such as name, R will list the number of observations that fall in each category. We can also produce a summary of just a single variable.

Once we have finished using R, we type q() in order to shut it down, or quit. When exiting R, we have the option to save the current workspace so that all objects (such as data sets) that we have created in this R session will be available next time. Before exiting R, we may want to save a record of all of the commands that we typed in the most recent session; this can be accomplished using the savehistory() function. Next time we enter R, we can load that history using the loadhistory() function.


Answer the following multiple choice questions by assigning the value 1, 2, 3 or 4 to the question title.

For example:

MC1 = 3
MC2 = 2