Relative Frequency Distribution of Quantitative Data
The relative frequency distribution of a data variable is a summary of the frequency proportion in a collection of non-overlapping categories.
The relationship of frequency and relative frequency is:
Example
In the data set faithful, the relative frequency distribution of the eruptions variable shows the frequency proportion of the eruptions according to a duration classification.
Problem
Find the relative frequency distribution of the eruption durations in faithful.
Solution
We first find the frequency distribution of the eruption durations as follows. Further details can be found in the Frequency Distribution tutorial.
> breaks = seq(1.5, 5.5, by=0.5)
> duration.cut = cut(duration, breaks, right=FALSE)
> duration.freq = table(duration.cut)
Then we find the sample size of faithful with the nrow function, and divide the frequency distribution with it. As a result, the relative frequency distribution is:
Answer
The frequency distribution of the eruption variable is:
duration.cut
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5)
0.187500 0.150735 0.018382 0.025735 0.110294 0.268382
[4.5,5) [5,5.5)
0.224265 0.014706
Enhanced Solution
We can print with fewer digits and make it more readable by setting the digits option.
> duration.relfreq
duration.cut
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5)
0.19 0.15 0.02 0.03 0.11 0.27 0.22
[5,5.5)
0.01
> options(old) # restore the old option
We then apply the cbind function to print both the frequency distribution and relative frequency distribution in parallel columns.
> cbind(duration.freq, duration.relfreq)
duration.freq duration.relfreq
[1.5,2) 51 0.19
[2,2.5) 41 0.15
[2.5,3) 5 0.02
[3,3.5) 7 0.03
[3.5,4) 30 0.11
[4,4.5) 73 0.27
[4.5,5) 61 0.22
[5,5.5) 4 0.01
> options(old) # restore the old option
Exercise
Find the relative frequency distribution of the eruption waiting periods in faithful.