In R, getting the number of columns is a basic operation that is required in many situations when working with DataFrames. When subsetting, analyzing, manipulating, publishing, and visualizing the data, the count of columns is a crucial piece of information to know. Therefore, R provides different approaches to get the total of the columns of the specified DataFrame. In this article, we will discuss some of the approaches which help us to get the count of the columns of the DataFrame.
Example 1: Using the Ncol() Function
The ncol() is the most frequent function to get the total of the columns of the DataFrames.
"y2" = c(15, 22, 24, 29),
"y3" = c(25, 32, 34, 39))
n <- ncol(df)
cat("-----Number of columns in Data Frame :", n)
In this example, we first create a “df” DataFrame with three columns which are labeled as “y1”, “y2”, and “y3” using the data.frame() function in R. The elements in each column are specified using the c() function which creates a vector of elements. Then, using the “n” variable, the ncol() function is used to determine the total of columns in the “df” DataFrame. Finally, with the descriptive message and “n” variable, the provided cat() function prints the results on the console.
As expected, the retrieved output indicates that the specified DataFrame has three columns:
Example 2: Count the Total Columns for the Empty DataFrame
Next, we apply the ncol() function to the empty DataFrame which also gets the values of the total columns but that value is zero.
n <- ncol(empty_df)
cat("---Columns in Data Frame :", n)
In this example, we generate the empty DataFrame, “empty_df”, by calling the data.frame() without specifying any columns or rows. Next, we use the ncol() function which is used to find the count of columns in the DataFrame. The ncol() function is set with the “empty_df” DataFrame here to get the total columns. Since the “empty_df” DataFrame is empty, it does not have any columns. So, the output of ncol(empty_df) is 0. The results are displayed by the cat() function which is deployed here.
The output shows the value “0” as expected because the DataFrame is empty.
Example 3: Using the Select_If() Function with the Length() Function
If we want to retrieve the number of columns of any specific type, we should use the select_if() function in conjunction with the length() function of R. These functions are used which are combined to get the total of the columns of each type. The code to use these functions is implemented in the following:
x1<-LETTERS[1:10]
x2<-rpois(10,2)
x3<-rpois(10,5)
x4<-sample(c("Summer","Winter"),10,replace=TRUE)
df1<-data.frame(x1,x2,x3,x4)
df1
length(select_if(df1,is.numeric))
In this example, we first load the dplyr package so that we can access the select_if() function and the length() function. Then, we create the four variables – “x1”, “x2”, “x3” and “x4”, respectively. Here, “x1” contains the first 10 uppercase letters of the English alphabet. The “x2” and “x3” variables are generated using the rpois() function to create two separate vectors of 10 random numbers with parameters 2 and 5, respectively. The “x4” variable is a factor vector with 10 elements that are randomly sampled from vector c (“Summer”, “Winter”).
Then, we attempt to create the “df1” DataFrame where all the variables are passed in the data.frame() function. Finally, we invoke the length() function to determine the length of the “df1” DataFrame that is created using the select_if() function from the dplyr package. The select_if() function selects the columns from a “df1” DataFrame as an argument and the is.numeric() function selects only the columns that contain numeric values. Then, the length() function gets the total of columns that is selected by select_if() which is the output of the entire code.
The length of the column is shown in the following output which indicates the total columns of the DataFrame:
Example 4: Using the Sapply() Function
Conversely, if we only want to count the missing values of the columns, we have the sapply() function. The sapply() function iterates over each column of the DataFrame to operate specifically. The sapply() function is first passed with the DataFrame as an argument. Then, it takes the operation to be performed on that DataFrame. The implementation of the sapply() function to get the count of NA values in the DataFrame columns is provided as follows:
c2 = c("N", NA, "A", "M", "E"),
c3 = c(NA, 92, NA, NA, 95))
sapply(new_df, function(x) sum(is.na(x)))
In this example, we generate the “new_df” DataFrame with three columns – “c1”, “c2”, and “c3”. The first columns, “c1” and “c3”, contain the numeric values including some missing values which are represented by NA. The second column, “c2”, contains the characters including some missing values which is also represented by NA. Then, we apply the sapply() function to the “new_df” DataFrame and calculate the number of missing values in each column using the sum() expression inside the sapply() function.
The is.na() function is that expression which is specified to the sum() function which returns a logical vector indicating whether each element in the column is missing or not. The sum() function adds up the TRUE values to count the number of missing values in each column.
Hence, the output displays the total NA values in each of the columns:
Example 5: Using the Dim() Function
Additionally, we want to get the total columns along with the rows of the DataFrame. Then, the dim() function provides the DataFrame’s dimensions. The dim() function takes the object as an argument whose dimensions we want to retrieve. Here’s the code to use the dim() function:
points=c(8, 10, 7, 4))
dim(d1)
In this example, we first define the “d1” DataFrame that is generated using the data.frame() function where two columns are set “team” and “points”. After that, we invoke the dim() function over the “d1” DataFrame. The dim() function returns the DataFrame’s number of rows and columns. Therefore, when we run the dim(d1), it returns a vector with two elements – the first of which reflects the number of rows in the “d1” DataFrame and the second of which represents the number of columns.
The output represents the dimensions of the DataFrame where the value “4” indicates the total columns and the value “2” represents the rows:
Conclusion
We now learned that counting the number of columns in R is a simple and important operation that can be performed on the DataFrame. Among all the functions, the ncol() function is the most convenient way. Now, we are familiar with the different ways to get the number of columns from the given DataFrame.