R Box Plot - r - learn r - r programming
- In this article, you will learn to create box-and-whisker plot in R programming. You will also learn to draw multiple boxplots in a single plot.
- Box-and-whisker plot can be created using the boxplot() function in R programming language.
- This function takes in any number of numeric vectors, drawing a boxplot for each vector.
- You can also pass in a list (or data frame) with numeric vectors as its components.
- Let us use the built-in dataset airquality which has "Daily air quality measurements in New York, May to September 1973."
- Let us make a boxplot for the ozone readings.
- We can see that data above the median is more dispersed.
- We can also notice two outliers at the higher extreme.
- We can pass in additional parameters to control the way our plot looks.
- You can read about them in the help section
?boxplot
. - Some of the frequently used ones are, main-to give the title, xlab and ylab-to provide labels for the axes, col to define color etc.
- Additionally, with the argument horizontal = TRUE we can plot it horizontally and with notch = TRUE we can add a notch to the box.
Return Value of boxplot()
- The boxplot() function returns a list with 6 components shown as follows.
- As we can see above, a list is returned which has the following
- stats - The position of the upper/lower extremes of the whiskers and box along with the median,
- n - The number of observation the boxplot is drawn with (notice that NA's are not taken into account)
- conf - Upper/Lower extremes of the notch, out - value of the outliers
- group - A vector of the same length as out whose elements indicate to which group the outlier belongs and
- names - A vector of names for the groups.
Multiple Boxplots
- We can draw multiple boxplots in a single plot, by passing in a list, data frame or multiple vectors.
- Let us consider the Ozone and Temp field of airquality dataset.
- Let us also generate normal distribution with the same mean and standard deviation and plot them side by side for comparison.
- Now we us make 4 boxplots with this data. We use the arguments at and names to denote the place and label.
Boxplot form Formula
- The function boxplot() can also take in formulas of the form y~x where, y is a numeric vector which is grouped according to the value of x.
- For example, in our dataset airquality, the Temp can be our numeric vector.
- Month can be our grouping variable, so that we get the boxplot for each month separately.
- In our dataset, month is in the form of number (1=January, 2-Febuary and so on).
- It is clear from the above figure that the month number 7 (July) is relatively hotter than the rest.