Numerical data that are not categorical also have distributions. In
general, when data is not categorical, reporting the frequency of each
entry is not an effective summary since most entries are unique. In our
case study, while several students reported a height of 68 inches, only
one student reported a height of 68.503937007874
inches and only one
student reported a height 68.8976377952756
inches. We assume that they
converted from 174 and 175 centimeters, respectively.
Statistics textbooks teach us that a more useful way to define a
distribution for numeric data is to define a function that reports the
proportion of the data below
Here is a plot of
Similar to what the frequency table does for categorical data, the CDF
defines the distribution for numerical data. From the plot, we can see
that 16% of the values are below 65, since
A final note: because CDFs can be defined mathematically the word empirical is added to make the distinction when data is used. We therefore use the term empirical CDF (eCDF).