How to Select or Filter Date-Based Rows of Dataframe in Pandas?
First, we will convert the datatype of data-based values to datetime64; then, we will use other methods (like DataFame.loc) to select or filter the dates-based between two dates. Initially, dates datatype can be given or represented in a variety of ways, like strings, np.datetime64, and datetime.datetime. By using the pd.to datetime() method in pandas, we can manipulate dates by converting the date datatype to datetime64 format.
arg: It can be float, list, string, integer, or DataFrame to be converted into datetime.
errors: Value can be specified as raise, coerce, or ignore. When a dictionary like a mapper, columns, or rows contains labels that are absent from the index being converted, raise a KeyError if “raise” is being used. Set to ignore by default.
dayfirst: Set as False by default. If True, the Boolean value places dayfirst.
yearfirst: If True, the boolean value places the yearfirst. False is the default value;
utc: Bounded value. Display the current time primary time standard (UTC) if True.
format: The position of the year, month, and the day is determined via a string input. It is none by default
infer_datetime_formatbool: If True and no format is specified, try to infer the date/time string format from the first element that isn’t null. False is the default value.
In the tutorial’s examples, we will employ the following strategy to choose information from the dates-based rows in the dataframe between two dates:
- Import pandas modules.
- Creating or loading data to create a dataframe.
- Create a dataframe.
- Convert the datatype of the columns with dates to datetime64.
- Specify a start and end date.
- Updated dataframe will be stored and shown after using a filter.
- Display dataframe.
Example # 1: Filter Dates-Based Values Using DataFrame.loc function
We will Filter dates-based data by using the “DataFrame.loc” function. To access a group of columns and rows of a DataFrame through the name of columns, indexes, or a boolean array, the “loc” function is used. The conditional expression in the loc variable in this example returns a boolean array with a True value if the row satisfies the condition and a False if the condition does not satisfy. Only those rows with True values are returned by using the loc method. Let’s create a dataframe first to understand this example clearly.
We have created a dataframe by passing a dictionary inside the pd.DataFrame() function after importing the pandas modules. In our dataframe, there are two columns whose names are specified as “tasks” and “date”. The “task” column consists of integer data (3, 2, 6, 1, 4, 5, 2, 1, 3, 4), and the “date” column consists of dates-based string values (“2022-01-05”, “2022-01-15”, “2022-02-01”, “2022-02-10”, “2022-03-20”, “2022-03-30”, “2022-04-12”, “2022-04-24”, “2022-05-15”, “2022-05-25”). To view our dataframe, we will use the print() function.
Now let’s check the datatype of our columns by utilizing the “dtypes” attribute.
As can be noticed, the datatype of the “tasks” column is int64, but the “date” column has the datatype of “object”. As mentioned above, we have to convert the datatype of the columns with dates to datetime64. To change the datatype of the “dates” column, the to_datetime() method will be used.
We have passed the “date” column in the to_datetime() function as an argument and specified the format as years-months-days. After changing the datatype of the “date” column, we assigned it to the “date” column of our “df” dataframe. By using the dtypes attribute, we can verify whether the datatype is changed to “datetime64” or not.
As the datatype has been changed now, we can use the loc method to filter the required rows.
Inside the df.loc method, we have created an expression/condition to select only those rows where the date is greater than or equal to “22-02-01” and less than “2022-04-24”. So, the rows from index 2 to 6 are satisfied by the condition as shown in the output.
Example # 2: Filter Dates-Based Values Using Different Attributes With DataFrame.loc function
In example # 1, we filtered the dates-based rows using the loc function. In this example, we will use different attributes with the loc function to select specific dates-based rows.
The first step will be the same; we will convert the datatype of the “date” column to “datetime64” by using the to_datetime() function.
After changing the datatype to “datetime64”, we will use the loc function to select rows between two dates.
We created a condition to select the data within the dates which are equal to or greater than “2022-04-10” and less than “2022-06-15”. We got four rows having dates (“2022-04-12”, “2022-04-24”, “2022-05-15”, and “2022-05-25”) that satisfied the condition. We can further filter the output to a specific time, date, week, month, year, etc., by using the datetime “.dt” attribute with the dataframe.loc method. Let’s suppose we have to select the row having the date “15” from the recently generated output of the “df” dataframe with four rows.
We specified the dt.day == 15 to fetch the data stored in the row having the date “15”. We got the “index 8” row as output because the date is 15 in that row. Now let’s use the “.between” attribute to select specific rows between two dates in our dataframe (created in example # 1).
By using the “.between()” function, we specified the dates between “2022-01-15” and “2022-03-30” to generate the output.
Example # 3: Filter Dates-Based Values Using the .query() function
Now we will see how the dataframe.query() function works to select specific dates-based rows from the dataframe. In this example, the same “df” dataframe will be used, which we created in example # 1. First, we will convert the datatype of the “date” column to “datetime64” as we have done in previous examples.
Now we will use the query() function to select rows between dates.
The condition inside the query(), as displayed above, is to select data within the dates which are equal to or greater than “2021-12-30” and less than “2022-01-15”. By default, the DataFrame’s columns are put in the query namespace, allowing access to the date column without specifying the name of the column.
These are the rows having data within the dates that satisfy the condition.
In this tutorial, we have discussed how to select the dates-based rows between dates in pandas. We have seen the syntax of the pandas.to_datetime() function, which we used to convert the datatype of the dates-based column to datetime64. We implemented examples to teach you how to use DataFrame.loc function, how to use different attributes with DataFrame.loc function, and how to use the .query() function to select the data from dates-based rows between two dates.