Example 1: Extract the DataFrame Columns by the Select() Method
In some cases, we want to extract multiple columns at once from the DataFrame. For this, R provides us with the select() method of the dplyr package which enables us to access the various columns from the DataFrame at once.
myDataframe <- data.frame(students=c('Mark', 'zack', 'Tyler', 'Daniel', 'Jermy'),
age=c(20, 25, 21, 22, 24),
science_marks=c(99, 68, 71, 88, 76),
language_marks2=c(60, 98, 74, 94, 88))
myDataframe %>%
select(students, science_marks, language_marks2)
In the previous program, we establish the DataFrame as “myDataframe” which is set with four columns and five rows entitled “students”, “age”, “science_marks” and “language_marks”. Each column has different types of values. After that, the dplyr package is loaded to use its functions for data manipulation. Then, we deploy the %>% operator with the “myDataframe” DataFrame to chain together the select() function for easy and readable code. The select() function here selects the particular columns from “myDataframe”. The columns to be selected are specified as arguments which include “students”, “science_marks”, and “language_marks”. Now, only those columns that are specified in the select() method are returned.
The output gives a new DataFrame that contains only the “students”, “science_marks”, and “language_marks” columns from the DataFrame.
Example 2: Extract the DataFrame Columns by the Column Names
However, we can extract the DataFrame’s column by just providing the column name. If we only want to fetch a single column from the DataFrame, we should use the double square bracket with the DataFrame and specify the column name to it.
item = c("Books", "Pens", "Highlighter"),
price = c(1200, 550, 180),
InStock = c(TRUE, FALSE, TRUE)
)
df1[["item"]]
df1[["price"]]
In the provided program, we create three columns – “item”, “price”, and “Instock” – with the values using the data.frame() function which is defined in the “df1” variable. Now, we want to access the specific column from the “df1” DataFrame. For this, the double square bracket “[[ ]]” operator is used with the column name inside it as an argument. In this case, we pass the “item” column first and then specify the “price” column in the double square bracket “[[ ]]” operator.
Now, we can see the output where the “item” and “price” columns along with their corresponding values are displayed in the following:
Example 3: Extract the DataFrame Columns by the $ Operator
We have the $ operator to extract the column from the DataFrame. The $ operator returns the values in the form of vectors in the obtained column. The following code snippet uses the $ operator for the column extraction:
course = c("Python", "MongoDB", "Java"),
fees = c(2000, 5000, 4000)
)
data1$course
In the given program, we define the “data1” DataFrame where we set two columns, “course” and “fees”, with different values since we need to extract the specified column from the DataFrame. Here, we use the $ operator with the column name after it to get a specific column in the DataFrame. The “course” column is used with the $ operator in this case to access the values.
Hence, the “course” column from the “data1” DataFrame is printed on the output screen:
Example 4: Extract the DataFrame Columns by the Condition
Additionally, the columns can be extracted from the DataFrame based on the specified condition. For the condition, we have the select_if() function which takes the conditional expression to access the columns. The following code snippet only extracts the numerical columns of the DataFrame:
my_df
my_df %>% select_if(is.numeric)
In the provided program, we use the built-in dataset which is the “iris”. First, we convert the built-in iris dataset to a tibble object “my_df” using the as_tibble() function. The iris dataset contains measurements of various attributes of iris flowers such as sepal length, sepal width, petal length, petal width, and the name of the species. Note that the species is the only column that contains the string values.
We only want to retrieve the numeric columns of the specified data set using the select_if() method. The select_if() method selects the columns from “my_df” that satisfy a specific condition which is “columns” which contains numeric values. The pipe “%>%” operator here passes “my_df” to the select_if() method as the first argument to select the columns. After that, we call the “is.numeric” function inside the select_if() function which is used as the condition to identify whether a column is numeric or not in the “my_df”.
Therefore, the output generates a tibble that only contains the numeric columns of my_df:
Example 5: Extract the DataFrame Columns by the Indices of Columns
Moreover, the index of the column can also be specified to extract the columns of the DataFrame. The following code snippet uses the indices of the columns to be extracted from the DataFrame:
ID=c(04,01,05,03,02),
Name=c("James","Charles","Marrie","Andrew","Elena"),
Gender=c("Male", "Male","Female", "Male", "Female"))
employees[ , c(2,3)]
In the provided program, we set the “employees” variable where the data.frame() function is invoked to specify the columns. The “employees” DataFrame contains the “ID”, “Name”, and “Gender” columns. After that, we select a particular column from the “employees” DataFrame using the indexing operator [ , c(2,3)]. The comma before the “c(2,3)” indicates that all rows should be selected. The “c(3,3)” argument specifies that the first and third columns should be selected. Therefore, we get a new DataFrame that contains only the “Name” and “Gender” columns of “employees” while omitting the “ID” column.
The expected columns are represented along with their corresponding rows in the following console:
Conclusion
We discussed the various examples to retrieve the columns from the DataFrame. These examples use the select() method, the $ operator, the column by name and indices method, and the conditional method. We can use any of the methods to extract the columns from the DataFrame based on the specific needs and the structure of the DataFrame.