One of the great advantages of the bootstrap approach is that it can be
applied in almost all situations. No complicated mathematical calculations
are required. Performing a bootstrap analysis in R entails only two steps.
First, we must create a function that computes the statistic of interest.
Second, we use the boot()
function, which is part of the boot library, to
perform the bootstrap by repeatedly sampling observations from the data
set with replacement.
Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns of \(X\) and \(Y\), respectively, where \(X\) and \(Y\) are random quantities. We will invest a fraction \(\alpha\) of our money in \(X\), and will invest the remaining \(1 − \alpha\) in \(Y\) . Since there is variability associated with the returns on these two assets, we wish to choose \(\alpha\) to minimize the total risk, or variance, of our investment. In other words, we want to minimize \({Var}(\alpha X + (1 - \alpha)Y)\). One can show that the value that minimizes the risk is given by
\[\begin{align} \alpha = \frac{\sigma_Y^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2\sigma_{XY}} \end{align}\]where \(\sigma_X^2 = Var(X)\), \(\sigma_Y^2 = Var(Y)\), and \(\sigma_{XY} = Cov(X,Y)\). In reality, the quantities \(\sigma_X^2\) , \(\sigma_Y^2\), and \(\sigma_{XY}\) are unknown. We can compute estimates for these quantities, \(\hat\sigma_X^2\), \(\hat\sigma_Y^2\), and \(\hat\sigma_{XY}\) , using a data set that contains past measurements for \(X\) and \(Y\) . We can then estimate the value of \(\alpha\) that minimizes the variance of our investment using
\[\begin{align} \hat\alpha = \frac{\hat\sigma_Y^2 - \hat\sigma_{XY}}{\hat\sigma_X^2 + \hat\sigma_Y^2 - 2\hat\sigma_{XY}} \end{align}\]The Portfolio data set in the ISLR2
package gives us information about past measurements of \(X\) and \(Y\).
To illustrate the use of the bootstrap on this data, we must first create
a function, alpha.fn()
, which takes as input the \((X,Y)\) data as well as
a vector indicating which observations should be used to estimate \(\alpha\). The
function then outputs the estimate for \(\alpha\) based on the selected observations.
alpha.fn = function(data, index) {
X = data$X[index]
Y = data$Y[index]
return((var(Y) - cov(X, Y)) / (var(X) + var(Y) - 2 * cov(X, Y)))
}
This function returns, or outputs, an estimate for \(\alpha\) based on applying
the formula for \(\hat\alpha\) to the observations indexed by the argument index
. For instance, the
following command tells R to estimate \(\alpha\) using all 100 observations.
> alpha.fn(Portfolio, 1:100)
[1] 0.5758321
The next command uses the sample()
function to randomly select 100 observations
from the range 1 to 100, with replacement. This is equivalent
to constructing a new bootstrap data set and recomputing \(\hat\alpha\) based on the
new data set.
> set.seed(1)
> alpha.fn(Portfolio,sample(100, 100, replace = T))
[1] 0.7368375
alpha.fn()
and sample()
functions to generate three estimations for \(\alpha\)
and store them in a vector alpha.hat
. Select 100 observations from the range 1 to 100 with replacement.Use the code below as a starting point.
Assume that:
ISLR2
library has been loadedPortfolio
dataset has been loaded and attached