In R, counting the total rows in a dataframe is a very common operation. DataFrames are one of the most commonly used data structures in R. They are essentially tables that allow us to store and manipulate the data in an organized manner. Sometimes, we need to count the number of rows in a DataFrame to get a sense of the size of the data. The total rows in a DataFrame can be determined using the various kinds of methods that are provided by R. In this article, we explore and use a few of these methods to count the rows of the given data frame.
Example 1: Get the Rows of the DataFrame Using the Nrow() Function
The nrow() is a built-in function of R which calculates the number of rows in a DataFrame. Note that the nrow() function takes one parameter which is a DataFrame. Consider the following code of R where the nrow() function is used:
"column2" = c(15, 16, 17, 18),
"column3" = c(19, 20, 21, 22))
n <- nrow(data)
cat( "---Rows in DataFrame:", n)
Initially, a DataFrame is created with three columns – “column1”, “column2”, and “column3”. Each column contains four numeric values. Then, we call the nrow() function to find the number of rows in the DataFrame. The nrow() here takes the “data” DataFrame as an argument. The nrow(data) output is assigned to the “n” variable.
The results of the nrow() function output the value “4” which means that the DataFrame has 4 rows:
Example 2: Get the Rows of the DataFrame for the Columns Containing NA Values
We can also count the rows of the DataFrame by eliminating the rows having NA values through the use of the nrow() function which invokes another function which is the na.omit() function inside it. The following code of R helps us to accomplish the rows with no NA values:
v2=c(9, 9, 8, NA, 4),
v3=c(6, 6, NA, 5, 1),
v4=c(2, NA, 1, 9, 4))
The code is provided where the “myDataframe” DataFrame is established using the data.frame() function. The DataFrame is specified with columns “v1”, “v2”, “v3”, and “v4”, and five rows which contain some NA values. Then, we employ the na.omit() function which removes the rows with NA values from the DataFrame. The na.omit() function is called inside the nrow() function. Next, the nrow() function finds the total rows in the obtained DataFrame.
The output gives a “2” value which is the number of rows in the resulting DataFrame after removing the rows with NA values:
Example 3: Get the Rows of the DataFrame Using the Table() Function
Conversely, we can use the table() function to get the rows of a DataFrame. The table() function in R is used to get a frequency table of the values in a given column. It instead returns a table object rather than a DataFrame. With the following given code, we can simply count the number of rows in the provided DataFrame:
Course = c("Python","Perl","PHP","Java"),
Age = c(21,26,20,24))
The DataFrame which is entitled “Students” is established which contains three columns: “Name”, “Course”, and “Age”. Each column is set with different values accordingly. After that, the table() function is utilized and takes the “Student$Month” parameter which specifies that we are only considering the “Course” column of the DataFrame. Specifically, the table() function counts the number of occurrences of each unique value in the “Course” column of the “Students” DataFrame. The table() function returns a table object that contains the counts of unique values in the “Course” column.
This output shows that all the course names that occurred only once in the “Course” column of the “Students” DataFrame:
Example 4: Get the Rows of the DataFrame Using the Group_By() Function and Tally() Function
Additionally, the group_by() and tally() functions of the dplyr package in R can be used together to get the rows of a DataFrame. The code is given in the following which uses the group_by() and tally() functions simultaneously:
df <- data.frame(c1 = rep(c(1:2), each = 2), c2 = letters[1:2])
df %>% group_by(c1) %>%tally()
First, we define the “dplyr” library and build a DataFrame with two columns: “c1” and “c2”. The “c1”’ column contains the values 1 and 2, repeated two times each. The “c2’” column contains the letters “a” and “b”, respectively. The DataFrame is printed on the console by the print() function. To pass on the DataFrame to the group_by function, we use the piping operator%>%. This function groups the rows according to the distinct values in the “c1” column. Then, the total of rows in each group is determined using the tally() method.
The output represents the “c1” column which contains the unique values from the original “c1” column and the “n” column contains the corresponding count of rows in each group:
Example 5: Get the Rows of the DataFrame Using the Ddply() Function
We have the ddply() function which is part of the plyr package in R. The ddply() function is used to divide a DataFrame into one or more subsets, apply a function to each subset individually, and then combine the results into a DataFrame. The code uses the ddply() code as follows to retrieve the rows of the DataFrame:
data_frame <- data.frame(Name = c("Sam","Jane","Mark","Harry"),
Day = c("Mon","Mon","Tues","Wed"))
ddply(data_frame, .(Day), nrow)
The code begins with importing the “plyr” package. Next, the “data_frame” DataFrame is formed using the data.frame() function. There, we specify two columns, “Name” and “Day”, to the DataFrame. Next, the ddply function is called from the plyr package which splits the rows of the “data_frame” DataFrame into groups based on the unique values in the “Day” column. Here, the “.(Day)” parameter groups the data according to the values in the “Day” column. Finally, the nrow() function is deployed which is used to determine the total rows in each group.
The output retrieves the “Day” column with the unique values from the original “Day” column and the “V1” column with the corresponding count of rows in each group:
We thoroughly discussed each of the methods to count the number of DataFrame rows. The approaches are all efficient ways to determine the number of rows in the DataFrame. By knowing the method to count the number of rows in a DataFrame, we can get a quick sense of the size of the data and the potential processing requirements to handle it.