R Box Plot - r - learn r - r programming
- In this article, you will learn to create box-and-whisker plot in R programming. You will also learn to draw multiple boxplots in a single plot.
- Box-and-whisker plot can be created using the boxplot() function in R programming language.
- This function takes in any number of numeric vectors, drawing a boxplot for each vector.
- You can also pass in a list (or data frame) with numeric vectors as its components.
- Let us use the built-in dataset airquality which has "Daily air quality measurements in New York, May to September 1973."

> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
- Let us make a boxplot for the ozone readings.
boxplot(airquality$Ozone)

- We can see that data above the median is more dispersed.
- We can also notice two outliers at the higher extreme.
- We can pass in additional parameters to control the way our plot looks.
- You can read about them in the help section
?boxplot
. - Some of the frequently used ones are, main-to give the title, xlab and ylab-to provide labels for the axes, col to define color etc.
- Additionally, with the argument horizontal = TRUE we can plot it horizontally and with notch = TRUE we can add a notch to the box.
boxplot(airquality$Ozone,
main = "Mean ozone in parts per billion at Roosevelt Island",
xlab = "Parts Per Billion",
ylab = "Ozone",
col = "orange",
border = "brown",
horizontal = TRUE,
notch = TRUE
)

Return Value of boxplot()
- The boxplot() function returns a list with 6 components shown as follows.
> b <- boxplot(airquality$Ozone)
> b
$stats
[,1]
[1,] 1.0
[2,] 18.0
[3,] 31.5
[4,] 63.5
[5,] 122.0
attr(,"class")
1
"integer"
$n
[1] 116
$conf
[,1]
[1,] 24.82518
[2,] 38.17482
$out
[1] 135 168
$group
[1] 1 1
$names
[1] "1"
- As we can see above, a list is returned which has the following
- stats - The position of the upper/lower extremes of the whiskers and box along with the median,
- n - The number of observation the boxplot is drawn with (notice that NA's are not taken into account)
- conf - Upper/Lower extremes of the notch, out - value of the outliers
- group - A vector of the same length as out whose elements indicate to which group the outlier belongs and
- names - A vector of names for the groups.
Multiple Boxplots
- We can draw multiple boxplots in a single plot, by passing in a list, data frame or multiple vectors.
- Let us consider the Ozone and Temp field of airquality dataset.
- Let us also generate normal distribution with the same mean and standard deviation and plot them side by side for comparison.
# prepare the data
ozone <- airquality$Ozone
temp <- airquality$Temp
# gererate normal distribution with same mean and sd
ozone_norm <- rnorm(200,mean=mean(ozone, na.rm=TRUE), sd=sd(ozone, na.rm=TRUE))
temp_norm <- rnorm(200,mean=mean(temp, na.rm=TRUE), sd=sd(temp, na.rm=TRUE))
- Now we us make 4 boxplots with this data. We use the arguments at and names to denote the place and label.
boxplot(ozone, ozone_norm, temp, temp_norm,
main = "Multiple boxplots for comparision",
at = c(1,2,4,5),
names = c("ozone", "normal", "temp", "normal"),
las = 2,
col = c("orange","red"),
border = "brown",
horizontal = TRUE,
notch = TRUE
)

Boxplot form Formula
- The function boxplot() can also take in formulas of the form y~x where, y is a numeric vector which is grouped according to the value of x.
- For example, in our dataset airquality, the Temp can be our numeric vector.
- Month can be our grouping variable, so that we get the boxplot for each month separately.
- In our dataset, month is in the form of number (1=January, 2-Febuary and so on).
boxplot(Temp~Month,
data=airquality,
main="Different boxplots for each month",
xlab="Month Number",
ylab="Degree Fahrenheit",
col="orange",
border="brown"
)

- It is clear from the above figure that the month number 7 (July) is relatively hotter than the rest.