Box plots (also known as box and whisker plots) provide a visualization that provide three key benefits compared to other visualization of data:
- Box plots show the size of the center quartiles and the values of Q1, Q2, and Q3.
- Box plots show the inter quartile range (commonly called the IQR), a measure of the spread of the data. The IQR is the value of Q3 - Q1.
The IQR tells us the range of the middle 50% of the data. In other words, it tells us the width of the “box” on the box plot. - Box plots show outliers in the dataset. Outliers are data points that differ significantly from most of the other points in the dataset. In other words, they “lie outside” most of the data. They are plotted as single dots on a box plot. You can calculate outliers mathematically using these rules:
- Low Outliers: All values less than Q1 - (1.5 × IQR).
- High Outliers: All values greater than Q3 + (1.5 × IQR).
Outliers can be typos, lies, or real data! Outliers can have a strong effect on certain statistics (like the average) so it’s important that as a data scientist, you recognize outliers and decide if you want to include them in your analysis. Outliers should only be excluded from analysis for a good reason!
NOTE: While outliers are excluded from our Analytic Boxplot chart; the outliers themselves are not represented as datapoints in the chart. Please refer to your data to identify which results are outliers.
Here is an example of a horizontal box plot with each component of the box plot labeled:
Comments
0 comments
Please sign in to leave a comment.