R

R – Drop Data Frame Columns by Name

When you are working with R data frames, you may have a requirement to drop the columns present in the data frame. In this article, we will see how to drop or remove columns in a data frame by specifying the column name. In order to drop columns in a data frame, we have to create a data frame with some rows and columns.

We can define a data frame as a collection of data in the form of rows and columns. Simply, it will store data in rows and columns. In the R language, a data frame is created using data.frame() function.

Syntax:

data.frame(values)

We can pass the parameter values. This can be a list, vector or array.

First, create a data frame with four rows and five columns related to market.

Code:

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Different approaches exist to drop the columns in the data frame by name. Let’s see them one by one.

Approach 1: Using names()

The names() method in R programming takes column names from the data frame. Here, we will specify the column names in a vector to be dropped and check these names present in the data frame with names(). Finally, we will use ‘! operator’, to drop the columns by selecting the columns through the vector through []. In this way, we can drop the columns by name in the data frame by selecting the column names through a vector.

Syntax:

dataframe_object[,!(names(dataframe_object) %in% column_names)]

Here,

  1. The dataframe_object is the name of the data frame.
  2. The names() is the method that takes the input data frame.
  3. The column_names is a vector that stores column names to be dropped from the dataframe.

Example 1

In this example, we are dropping a single column: market_name. So, we have to specify this column in a vector.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#consider 1 column - market_name

column_name=c('market_name')

#display remaining columns by dropping the above-selected column using names() with !

print(market[,!(names(market) %in% column_name)])

Result:

From the result, we can see that the market_name column is dropped, and the remaining columns were returned in a data frame.

Example 2

In this example, we are dropping multiple columns: market_name, market_place, and market_squarefeet. So, we have to specify these three columns in a vector.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#consider the 3 columns - market_name,market_place,market_squarefeet

column_names=c('market_name','market_place','market_squarefeet')

#display remaining columns by dropping the above selected columns using names() with !

print(market[,!(names(market) %in% column_names)])

Result:

From the result, we can see that the market_name,market_place and market_squarefeet columns were dropped and the remaining columns were returned in a data frame.

Approach 2: Using select() From dplyr library

The select() method available in the dplyr library is used to take column names from the dataframe. Here, it takes the data frame as the first parameter, and we will specify the column names in a vector that will be dropped as a second parameter. It uses the minus (-) sign to drop these selected column names provided in a vector. In this way, we can drop the columns by name in the data frame by selecting the column names through the vector.

Syntax:

select(dataframe_object,- column_name)

Parameters:

It takes two parameters:

  1. The dataframe_object is the name of the data frame.
  2. The column_names is a vector that stores column names to be dropped from the dataframe.

To use this method, we have to load the dplyr library. We can do this by using the library() function.

library("dplyr")

Example 1

In this example, we are dropping a single column: market_name. So, we have to specify this column in a vector.

#load library dplyr

library("dplyr")

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#consider 1 column - market_name

column_name=c('market_name')

#display remaining columns by dropping the above selected column using select()

print(select(market, -column_name))

Result:

The result will show the market_name column is dropped and the remaining columns were returned in a data frame.

Example 2

In this example, we are dropping multiple columns: market_name, market_place and market_squarefeet. So, we have to specify all these three columns in a vector.

#load library dplyr

library("dplyr")

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#consider the 3 columns - market_name,market_place,market_squarefeet

column_names=c('market_name','market_place','market_squarefeet')

#display remaining columns by dropping the above selected columns using select()

print(select(market, -column_names))

Result:

From the result, we can see the market_name,market_place and market_squarefeet columns were dropped, and remaining columns were returned in a data frame.

Approach 3: Using subset()

The subset() method takes column names from the data frame. Here, it takes the data frame as the first parameter, and, in the second parameter, we will specify the column names through the select parameter that are to be dropped. It uses the minus (-) sign to drop these selected column names provided in a vector. In this way, we can drop the columns by name in the data frame by selecting the column names through the select parameter.

Syntax:

subset(dataframe_object, select = - column_names)

Parameters:

It takes two parameters:

  1. The dataframe_object is the name of the data frame.
  2. The column_names is a vector that stores column names to be dropped from the data frame, which is passed through the select parameter.

Example 1

In this example, we are dropping a single column: market_name. So, we have to specify this column in a vector and assign it to select.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#display remaining columns by dropping the market_name using subset()

print(subset(market, select = - c(market_name)))

Result:

The result will show the market_name column is dropped and remaining columns were returned in a data frame.

Example 2

In this example, we are dropping multiple columns: market_name, market_place, and market_squarefeet. So, we have to specify all these three columns in a vector and assign it to the select.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#display remaining columns by dropping the market_name using subset()

print(subset(market, select = - c(market_name,market_place,market_squarefeet)))

Result:

From the result, we can see that the market_name, market_place, and market_squarefeet columns were dropped and remaining columns were returned in a data frame.

Approach 4: Using within()

The within() method takes column names from the data frame. Here, It takes the data frame as the first parameter, and in the second parameter, we will specify the column names that will be dropped through the rm() method. The rm() method removes the columns specified inside it. In this way, we can drop the columns by name in the data frame.

Syntax:

within(dataframe_object, rm(column_names))

Parameters:

It takes two parameters:

  1. The dataframe_object is the name of the data frame.
  2. The rm() takes column names separated by a comma.

Example 1

In this example, we are dropping a single column: market_name. So, we have to specify this column in a vector and assign it to select.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#display remaining columns by dropping the market_name column using within()

print(within(market, rm(market_name)) )

Result:

The result shows the market_name column is dropped and remaining columns were returned in a data frame.

Example 2

In this example, we are dropping multiple columns: market_name, market_place, and market_squarefeet. So, we have to specify these three columns in a vector and assign it to the select.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar',
'grocery','restaurent'),market_squarefeet=c(120,342,220,110))

#display remaining columns by dropping the market_name using within()

print(within(market, rm(market_name,market_place,market_squarefeet)) )

Result:

From the result, we can see that the market_name,market_place and market_squarefeet columns were dropped and remaining columns were returned in a data frame.

Conclusion

This article discussed the four approaches to drop or remove the columns from an R data frame by column name. Based on the requirement in your application, you can use any of the methods from the following four methods: names(), select(), subset(), and within().

About the author

Sireesha Lavu

This is Sireesha Lavu from Gogulamudi, Andhra Pradesh, India 522015.
I am currently working as a teacher and interested in writing technical articles on computer science.