R

# Boxplot in R

“A boxplot, commonly known as a box and whisker plot, is a type of plot in R. It is a graphical depiction that lets you sum up the data’s essential features and determine whether any outliers exist. Boxplot may also be used to compare the dispersal in the data collection by generating a boxplot for each. Evaluating ranges is easy using a box plot. Since the central, distributed, and general ranges are all visible right away.

Boxplots are a type of graph that shows how uniform data is dispersed within a dataset. The dataset is split into three quartiles as a result of this. This graph depicts the data set’s minimal, maximal, average, first, second, and third quartiles.

A boxplot’s box begins in the very first quartile (25 percent) and terminates in the third (75 percent). As a result, the box reflects half (50 percent) of the center data through a line within that indicating the average. Despite including boxplot outliers, a division is shaped on either side of the box to the uttermost data, if they exist, will be represented by circles.

This tutorial will educate you on how to use R to make boxplots.”

## Creating Boxplot in R

A box and whisker plot can be created using R’s “boxplot()” function. Various inputs can be used to create this graph, including vectors and data frames. In the equivalent graph, you can also enter a formula as input when producing boxplots for numerous groups.

## Creating Boxplot Using a Vector in R

If you want to create a box plot in R from a vector, simply pass the vector to the “boxplot()” function.

Here we have created a vector “s” and assigned it a list of numerical values. Using the “boxplot()” function, pass this vector “s” as a parameter. The boxplot in R is set to be vertical by default, but if you want to change it to horizontal, you can do so by setting the “horizontal” expression “TRUE.”

A horizontal boxplot created from a vector is displayed below.

It’s essential to keep in mind that boxplots obscure the data’s underlying distribution. To fix this problem, the “stripchart()” function in R could be used to insert dots into a boxplot.

Here we have used the method “jitter.” “pch” means plot characters. The default “pch” in R is 1, which creates an empty circle, whereas “pch=19” means solid circles. So what we used are solid circles with an orange color. Outliers will not be overplotted if the data points are jittered.

## Creating Boxplot Using “notch” in R

We can also make a boxplot with a notch in R. It assists us in determining how well the medians of various data groups interact with one another. By specifying the notch argument to TRUE, you can illustrate the 95 percent confidence intervals for the median in the R boxplot. The box represents the upper and lower bounds, while the center line can see the median.

A “notch,” or shrinking of the box, is utilized around the median in notched box plots. Notches can help determine the importance of a discrepancy in medians. If there is no overlapping between the notching of 2 boxes, there’s a good chance the medians aren’t the same.

The boxplot drawn from the “notch” is represented below.

## Creating Boxplot Using a Dataset in R

To create a boxplot in R, you can also use the dataframes in the “boxplot()” function. In this instance, we will use the R base provided built-in dataset “Chickweight.”

Here you can see the dataset inside the “ChickWeight” table. It contains 4 columns weight, Time, chick, and Diet. All the columns have numerical values stored in them.

We will choose 2 columns, i.e., weight and Diet, from the dataset. Using the “boxplot()” function, we will draw boxplots for the selected date.

In the above code piece, we have designed a boxplot of “weight” against the “Diet.” We have specified the variables’ names with the dataset name. Inside the braces of the “boxplot()” function, we have used the dataframe name “ChickWeight,” “\$” operator to specify the column, and the column name “weight,” then the column with the dataframe name “ChickWeight\$Diet.”

The resultant boxplot clearly shows the outliner’s dispersion.

To make this boxplot visually better and more detailed, you can add dots. You can accomplish this by using the “stripchart()” function.

You can see the dots we created to show the essential data division in each boxplot.

## Creating Multiple Boxplot in R

Creating multiple boxplots is another technique that can be used in R programming. To implement this method, we are using a built-in dataset in R base.

The dataset we used here is “trees” provided by R base. We can also add colors to the boxplot. In the “boxplot()” function, we set the color “col” as “rainbow,” which will put in different colors to each boxplot.

If you want to plot a distinct boxplot for every column in your R dataframe, you may do so with the utilization of the “lapply()” function.

We’ll split the graphics “par” into a row as well as the number of columns in the dataset in this example. Individual graphs, on the other hand, might be plotted. The “invisible()” function prevents the “lapply” function’s output text from being visible.

The image below shows the boxplot created for each data column individually.

## Conclusion

R programming provides a variety of operations that can be performed. Creating a boxplot is another useful and simple method to display data visually in plots. In this article, we discussed what boxplots are and how they display data. We explained four different techniques that can be used to draw boxplots in R, using Rstudio in Ubuntu 20.04. Including using simple vectors to create boxplots, utilizing “notch,” using dataframes, and creating multiple boxplots as well. We demonstrated each method by elaborating on different examples of codes. This will make learning R for creating boxplots much easier for you.