R

Create a Dataframe in R

“In R, dataframes are the most frequently utilized object for storing data. It is a collection of vectors of identical lengths. A dataframe is a table or a structure that resembles a two-dimensional array, where each column represents a single variable’s value while each column’s values are represented in a single row.

A dataframe must meet some characteristics that we have mentioned in this paragraph. The columns in the dataframe must be named and not left empty. Each row in the dataframe must be named uniquely.”

Creating Dataframe in R

R programming provides various methods to create a dataframe. We can construct a dataframe by utilizing the vectors from another dataframe and by importing a file. In this article, we will discuss these techniques that will help you in learning the concept of creating a dataframe in R.

Using Vectors to Make a Dataframe in R

R programming allows you to build a dataframe using vectors that are equal in size. For this purpose, R provides you with a built-in function “data.frame()”.This function can catch as many vectors as we wish.

The following is the syntax for calling this function:

df <- data.frame(v1, v2, v3, v4)

In any dataframe, a column is represented by each vector, and the number of rows will be determined by the length of any vector.

There are 2 ways to generate a dataframe using vectors, one by creating the required vectors and then passing them to the “data.frame()” function. And the further substitute is to directly provide the vectors to the “data.frame()” function and assign them values inside the function braces.

We will help you understand both methods by demonstrating practical examples in Rstudio in Ubuntu 20.04.

We will carry out an example to make a dataframe from vectors. We will first create vectors and then pass them all as a parameter of the “data.frame()”.

df v 1.png

In the program we have demonstrated in the above image, we used four-vectors. All the vectors are created using the “c()” function. The first vector we generated is “Name,” which will store the names of 3 people having character type values. The second vector is “Language” and stores the names of 3 programming languages. It also stores character data types. Our third vector is “Age,” which stores numeric data types. The last vector, “Gender,” also store 3 values of character data type. All the 4 vectors are passed to the “data. frame()” function as its parameter. The “df” dataframe stored the output of the “data.frame()” function in it. In the very last step of the code, we used the “print()” statement to display the output.

The resultant dataframe has 4 columns, each having the same size of vectors.

out v 1.png

The other alternative method to generate a dataframe in R using the vectors is that you can provide vectors with values inside the “data.frame()” function.

v2 df.png

This code snippet simply created vectors and assigned them values inside the body of the “data.frame()” function and stored this function in dataframe “df.” “print()” displayed the output.

The resultant table yields the same output, which can be seen in the image below.

out v2.png

It’s worth repeating that to generate a dataFrame from a list of vectors, each vector in the list must have the same amount of elements; else, the script will report an error.

Using Other Dataframes to Create a Dataframe

Creating a dataframe by using two or more dataframe is another technique applied in R programming. We can do for the grouping of columns of one dataframe to another as well as to join the rows.

We will execute two programs here, one for the horizontal grouping and the other for vertical grouping.

For the columns, the function we will use is “cbind().” Let’s create 2 dataframes first and then combine them using the “cbind()” function.

In the first chunk of code, 2 columns will be constructed, and the values are stored in dataframe “df1”.

df1.png

The resultant table yields the same output, which can be seen in the image below.

df1 out.png

Another dataframe, “df2,” is generated having 2 columns, “Age” and “Gender.”

df2.png

The resultant table yields the same output, which can be seen in the image below.

df2 out.png

A dataframe “df3’ is constructed and utilizes the “cbind()” function to combine the “df1” and “df2”.

df3.png

The ultimate output shows a table generated from merging the 2 dataframes.

out v2.png

Similarly, to create the dataframe rows, we can use the “rbind()” function. Inside the “rbind()” function, we will pass the 2 dataframes as parameters. This function will concatenate the 2 smaller vertical dataframes into a whole table. Keep in mind that the number of rows must be the same for all the dataframes you will create.

Reading a File Into a Dataframe

Aside from generating a dataFrame, there are a few more things you can do. We can import a tabular dataset and save it as a DataFrame. It is the most frequent method for constructing a DataFrame in R programming.

We have created a CSV file, stored values in tabular format, and named it “table.csv.” We have saved this file in our “documents” folder. In Rstudio, we will read it using the “read.csv()” function as a new dataframe named “table.”

To read a CSV file in Rstudio, the first thing you need to do is set up your current working directory. By using the function “getwd(),” you can locate your current working directory. In the very next step, you have to set your directory to where you have saved the “.csv” file. If you do not consider these steps, you will get an error while struggling to read the file.

Once you correctly set the path of the current working directory to the directory where you have stored your CSV file, now you will use the “read.csv ()” function. Write the “.csv” file name with a quotation mark(“”) inside the “read.csv()” and use a dataframe with whatever name you want to store its values.

csv.png

The data we have stored in our CSV file is displayed here.

csv out.png

Conclusion

In today’s topic, we explored the creation of dataframes. Dataframes are necessary structures of R programming. We have discussed different ways to construct your dataframes in Rstudio in Ubuntu 20.04 environment by elaborating each with a handy example. Putting hands-on practice to these example codes will not only introduce you to the need to use dataframes but also the alternative ways to build them.

About the author

Saeed Raza

Hello geeks! I am here to guide you about your tech-related issues. My expertise revolves around Linux, Databases & Programming. Additionally, I am practicing law in Pakistan. Cheers to all of you.