For these exercises, we will be using the data from the survey collected by the United States National Center for Health Statistics (NCHS). This center has conducted a series of health and nutrition surveys since the 1960’s. Starting in 1999, about 5,000 individuals of all ages have been interviewed every year and they complete the health examination component of the survey. Part of the data is made available via the NHANES package. Once you install the NHANES package, you can load the data like this:

library(NHANES)
data(NHANES)

The NHANES data has many missing values. The mean and sd functions in R will return NA if any of the entries of the input vector is an NA. Here is an example:

library(dslabs)
data(na_example)
mean(na_example)
#> [1] NA
sd(na_example)
#> [1] NA

To ignore the NA’s we can use the na.rm argument:

mean(na_example, na.rm = TRUE)
#> [1] 2.3
sd(na_example, na.rm = TRUE)
#> [1] 1.22

Let’s now explore the NHANES data.

1. We will provide some basic facts about blood pressure. First let’s select a group to set the standard. We will use 20-to-29-year-old females. AgeDecade is a categorical variable with these ages. Note that the category is coded like “ 20-29”, with a space in front! What is the average and standard deviation of systolic blood pressure (the blood pressure can be found in theBPSysAve column)? The result should be a data frame with a mean and sd column. Store this dataframe as summary. Hint: Use filter and summarize and use the na.rm = TRUE argument when computing the average and standard deviation. You can also filter the NA values using filter.

2. Using a pipe, assign the average to a numeric variable ref_mean.

Hint: Use the code similar to above and then pull.

3. Now do the same thing, but instead of calculating the mean and standard deviation, report the minimum and maximum values (again for 20-to-29-year-old females). This time store the result in a data frame min_max with a min and a max column.