This exercise relates to the College data set, which can be found in the ISLR library. It contains a number of variables for 777 different universities and colleges in the US. The variables are

Before reading the data into R, it can be viewed in Excel or a text editor.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

head(college)
  1. Use the summary() function to produce a numerical summary of the variables in the data set. Store the summary in summary.num.

  2. Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10].

    Your plot should look like this:

    plot

  3. Use the plot() function to produce side-by-side boxplots of Outstate versus Private.

    Your plot should look like this:

    plot

  4. We create a new qualitative variable, called Elite, by binning the Top10perc variable. We divide universities into two groups based on whether the proportion of students coming from the top 10% of their high school classes exceeds 50%.

     Elite <- rep("No", nrow(College))
     Elite[College$Top10perc > 50] <- "Yes"
     Elite <- as.factor(Elite)
     College <- data.frame(College, Elite)
    
  5. Use the summary() function to see how many elite universities there are. Store the summary in the elite.summary variable. Use the summary function only on the Elite column not the whole dataset

  6. Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite.

    Your plot should look like this:

    plot

  7. Use the hist() function to produce some histograms with numbers of bins for a few of the quantitative variables.

    Your plots should look like this:

    plot plot plot plot

  8. Continue exploring the data, and look for more interesting insights.


Assume that: