Analysis the continuous data points using Box plot.
Suppose ,we take example of Frequency of fruits.
Distribute of frequency is
First of all arrange into sorted order.
1- Find Median / Middle Value [Median always found in sorted order]
Median is 33.5 of above Fruits data distribution.
50% of data in the left side of median and 50% of data in the right side of the median.
Median is 33.5
2- Now, find First quartile Q1.
Above, 21 is come under first quartile. Q1
57 is come under second quartile. Q2
It means -
25% data of complete distribution is left side of 21
75% data of complete distribution is right side of 21
25% of data is left of Q1.
75% of data is right of Q1.
75% data of left of Q3.(Q3=57)
25% data of right Q3.(Q3=57)
3- Find Min -Max
According to box plot analysis .
4- Analysis the plot and data.
- Median is 28 .its mean 50% peoples that height is less than 28 and 50% people that height more than 28.
- Q3 quartiles 3 mean 75% peoples that height is less than 33 and 25% peoples that height is more than 33.
- Q1 quartiles 1 mean 25% peoples height less than 19 and 75% peoples height more than 19.
- Size of box is called IQR .
- Points above of max value are consider outliers.
- Points below of min value are consider as outliers.
5- How to find outliers and the max and min value.
For it , We calculate an upper fence as well as lower fence anything within the fence is not consider an outliers and anything beyond the fence is an outlier.
Calculate Upper fence and Lower fence.
upper fence = Q3+1.5 (IQR)
Lower fence = Q1–1.5 (IQR)
It mean anything above 54 consider an outlier and anything below 2 consider an outlier.
If any doubt regarding it. please comment below and ask feel free.
If any doubt regarding this tutorial ask feel free on LinkedIn- http://linkedin.com/in/puneet166
GitHub workspace link- https://github.com/puneet166?tab=repositories