Frequency Distribution of Quantitative Data
The frequency distribution of a data variable is a summary of the data occurrence in a collection of non-overlapping categories.
Example
In the data set faithful, the frequency distribution of the eruptions variable is the summary of eruptions according to some classification of the eruption durations.
Problem
Find the frequency distribution of the eruption durations in faithful.
Solution
The solution consists of the following steps:
- We first find the range of eruption durations with the range function. It shows that the observed eruptions are between 1.6 and 5.1 minutes in duration.
- Break the range into non-overlapping sub-intervals by defining a sequence of equal distance break points. If we round the endpoints of the interval [1.6, 5.1] to the closest half-integers, we come up with the interval [1.5, 5.5]. Hence we set the break points to be the half-integer sequence { 1.5, 2.0, 2.5, ... }.
- Classify the eruption durations according to the half-unit-length sub-intervals with cut. As the intervals are to be closed on the left, and open on the right, we set the right argument as FALSE.
- Compute the frequency of eruptions in each sub-interval with the table function.
Answer
The frequency distribution of the eruption duration is:
duration.cut
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5)
51 41 5 7 30 73 61
[5,5.5)
4
Enhanced Solution
We apply the cbind function to print the result in column format.
duration.freq
[1.5,2) 51
[2,2.5) 41
[2.5,3) 5
[3,3.5) 7
[3.5,4) 30
[4,4.5) 73
[4.5,5) 61
[5,5.5) 4
Note
Per R documentation, you are advised to use the hist function to find the frequency distribution for performance reasons.
Exercise
- Find the frequency distribution of the eruption waiting periods in faithful.
- Find programmatically the duration sub-interval that has the most eruptions.