In this problem, we will simulate data from \(m = 100\) fund managers.

set.seed(1)
n <- 20
m <- 100
X <- matrix(rnorm(n * m), ncol = m)

> dim(X)
[1]  20 100

These data represent each fund manager’s percentage returns for each of \(n = 20\) months. We wish to test the null hypothesis that each fund manager’s percentage returns have population mean equal to zero. Notice that we simulated the data in such a way that each fund manager’s percentage returns do have population mean zero; in other words, all \(m\) null hypotheses are true.

Questions

Conduct a one-sample t-test for each fund manager, and plot a histogram of the p-values obtained. Store the p-values in the vector p.values.
(hint: you can use a for-loop to fill the p.values vector)
If we control Type I error for each null hypothesis at level \(\alpha = 0.05\), then how many null hypotheses do we reject? Store your answer in the variable rejected.null.hypotheses.
If we control the FWER at level 0.05 making use of the most conservative method, then how many null hypotheses do we reject? Store the adjusted p-values in p.values.fwer and store your answer in the variable rejected.null.hypotheses.FWER.
If we control the FDR at level 0.05, then how many null hypotheses do we reject? Store the adjusted p-values in p.values.fdr and store your answer in the variable rejected.null.hypotheses.FDR.
Now suppose we “cherry-pick” the 10 fund managers who perform the best in our data (i.e. have the highest mean return). If we control the FWER for just these 10 fund managers at level 0.05, then how many null hypotheses do we reject? If we control the FDR for just these 10 fund managers at level 0.05, then how many null hypotheses do we reject? Store your answer in the variables top.managers.FWER and top.managers.FDR respectively. To control FWER, again use the most conservative method.
Explain why “cherry-picking” of the smallest p-values is misleading.
MC1: What goes wrong if we cherry-pick?
1. \(m\) decreases -> threshold to reject \(H_0\)’s is lower -> more \(H_0\)’s are rejected
2. \(m\) decreases -> threshold to reject \(H_0\)’s is higher -> more \(H_0\)’s are rejected
3. \(m\) increases -> threshold to reject \(H_0\)’s is lower -> more \(H_0\)’s are rejected
4. \(m\) increases -> threshold to reject \(H_0\)’s is higher -> more \(H_0\)’s are rejected