In this problem, we will simulate data from \(m = 100\) fund managers.
set.seed(1)
n <- 20
m <- 100
X <- matrix(rnorm(n * m), ncol = m)
> dim(X)
[1] 20 100
These data represent each fund manager’s percentage returns for each
of \(n = 20\) months. We wish to test the null hypothesis that each
fund manager’s percentage returns have population mean equal to
zero. Notice that we simulated the data in such a way that each fund
manager’s percentage returns do have population mean zero; in other
words, all \(m\) null hypotheses are true.
Questions
- Conduct a one-sample t-test for each fund manager, and plot a
histogram of the p-values obtained. Store the p-values in the vector
p.values
.
(hint: you can use a for-loop to fill the p.values
vector)
- If we control Type I error for each null hypothesis at level \(\alpha =
0.05\), then how many null hypotheses do we reject? Store your answer in the variable
rejected.null.hypotheses
.
- If we control the FWER at level 0.05 making use of the most conservative method, then how many null hypotheses
do we reject? Store the adjusted p-values in
p.values.fwer
and store your answer in the variable rejected.null.hypotheses.FWER
.
- If we control the FDR at level 0.05, then how many null hypotheses
do we reject? Store the adjusted p-values in
p.values.fdr
and store your answer in the variable rejected.null.hypotheses.FDR
.
- Now suppose we “cherry-pick” the 10 fund managers who perform
the best in our data (i.e. have the highest mean return). If we control the FWER for just these
10 fund managers at level 0.05, then how many null hypotheses
do we reject? If we control the FDR for just these 10 fund
managers at level 0.05, then how many null hypotheses do we
reject? Store your answer in the variables
top.managers.FWER
and top.managers.FDR
respectively.
To control FWER, again use the most conservative method.
- Explain why “cherry-picking” of the smallest p-values is misleading.
MC1: What goes wrong if we cherry-pick?
- \(m\) decreases -> threshold to reject \(H_0\)’s is lower -> more \(H_0\)’s are rejected
- \(m\) decreases -> threshold to reject \(H_0\)’s is higher -> more \(H_0\)’s are rejected
- \(m\) increases -> threshold to reject \(H_0\)’s is lower -> more \(H_0\)’s are rejected
- \(m\) increases -> threshold to reject \(H_0\)’s is higher -> more \(H_0\)’s are rejected