Python

Pandas Count Rows with Condition

Pandas are among the most widely adopted data science and machine learning tools for data cleaning and processing. You may need to acquire several rows present in the DataFrame while utilizing the Pandas DataFrame to store and analyze your data. For the data handling process, you may require to swiftly count the occurrences of the same or different entries in your whole dataset or in particular rows that fulfill any given condition.

Pandas allows us to determine the shape of a DataFrame by counting the number of rows as well as columns in the DataFrame. You can employ a variety of methods to understand the concept to count the number of rows and columns in Pandas. These include “len()”, “df.shape[0]”, “df[df.columns[0]].count()”, “df.count()”, and “df.size()” methods. The quickest of these methods is len(), which we shall look at in this tutorial.

Let us start learning it by practically implementing the example codes.

Utilizing the Pandas Len() Method

The technique we will implement in this illustration is the “len()” method. Let’s explore how it’s working.

All the example codes that we will use in this tutorial are implemented and executed in Python utilizing the “Spyder” tool. The first task is to install and run the “Spyder” tool on your desktop or laptop computer. Once we are done with the installation process, we open the tool and open a new file which has a “.py” extension. Here, “py” represents “python”. Before we start writing our code, we need to add some prerequisites. As our article’s title briefly tells us that whatever techniques we will use must be supported by the “pandas” library.

Therefore, we need to add a Pandas library by writing the script “import pandas as pd”. We now imported the Pandas library and declared that Pandas can now be accessed by writing “pd” instead of the full form “pandas” throughout the program. Moving forward, we create a Pandas DataFrame where we exercise the chosen Pandas technique. For the construction of a DataFrame, Pandas provide us with a very simple and useful method “pd.DataFrame()” where “pd” refers to “pandas” and “DataFrame” is the keyword used to create the DataFrame.

We employed this method in our script. Between its parentheses, we initialized three columns. Our first column’s title is “group” which stores eight string values which are “X”, “X”, “X”, “X”, “Y”, “Y”, “Y”, and “Y”. The second column in the DataFrame is “pos” which also stores 8 string values. These values are “Au”, “Bo”, “Bo”, “Bo”, “Au”, “Au”, and “Bu”. The last column here is “scores” and it holds eight integer values, i.e. “19”, “23”, “18”, “15”, “15”, “12”, “21”, and “28”. When we generate a DataFrame, we also have to create a variable or a DataFrame object to store this DataFrame.

Here, the variable that we created for the said purpose is “res”. Then, we assign this function to the output generated from invoking the “pd.DataFrame()” method. Now, to see this DataFrame, we created on the terminal that we used the “print()” function which displays the output. Let’s execute this Python script:

To get the output displayed on the terminal, click the “Run file” button on the “Spyder” tool. Here is our initial DataFrame:

Utilizing the Len() Method with One Condition

Now, we have to count the rows of the specified column in the DataFrame which meet the provided condition. We will first apply the condition on a single column to retrieve the number of rows that matches the condition. Then, we apply it to the multiple columns of the DataFrame. For both techniques, we utilized the “len()” method of Pandas. The syntax for this method to apply the conditions on a single column is provided in the following:

According to the syntax, we invoked the “len()”method which counts the number of rows. Inside its braces, we specified a condition with the name of the DataFrame and the DataFrame column name. We selected the “group” column from our DataFrame and specified a condition for it.  The condition says to check if any value of the “group” column is equal to “X”. Whenever the condition is matched, the “len()” method counts the row that contains it.

Now, to store this counted value of rows that fulfilled the condition, we created a variable “count”. We utilized the “print()” method to show a text on the terminal before the counted rows. For the purpose to see the output of the displayed counted rows, we again employed the “print()” function and provided the “count” variable as the parameter.

We have both our DataFrame and the counted rows that matched the condition displayed on the terminal. We can note that the DataFrame has “4” rows that match the condition. You may also verify it by comparing it to the previous DataFrame. The “group” column has 4 “X” values, so it is calculated by the Pandas “len()” method.

Utilizing the Len() Method with Multiple Conditions

We counted the number of rows with the condition for a single column in the previous example. Now, we will learn how to count the rows for two columns. The syntax it follows is:

Explaining this syntax, the “len()” function is called to count the number of rows that meet the conditions. Then, we mentioned the name of the DataFrame whose rows we want to count. Now, the name of the first column with the particular condition, then the name of the DataFrame’s second column with the specified condition. Between both these conditions is the “&” operator. This operator is called the “And” operator. When it comes between two conditional statements, that mean that the rows will only be counted if both the conditions are met.

In our illustration, we selected the “group” column and the “pos” column. We applied the conditions to both of these columns. The condition on the “group” column checks the values in this particular column that are equal to “Y”. Whereas, the condition on “pos” checks the values equal to “Bo”. The “&” operator checks the values from the output of both values and verifies the condition. So, we need the number of rows that has the “group” value “X’ and the “pos” equal to “Bo”.

We created another variable “cal”. When the conditions are checked the “len()” function counts the number of rows and stores it in the “outcome” variable. Lastly, we employed two “print()” methods, one to display a text while the other to print the counted rows by the “len()” function stored in the “cal” variable.

The following output image attached shows us that there are only 3 rows in the DataFrame that meet the specified condition. From the “group” and “pos” column, only three rows are retrieved that have “X” “group” and the “pos” is “Bo”. Give it a few seconds to verify by yourself if the generated output is correct by examining the DataFrame displayed in the following snapshot:

You learned how to apply the conditions on two columns. Now, applying them on multiple columns will not get you into trouble. We now apply the conditions on all the three columns in the DataFrame and get only the count of those rows that meet all of the three conditions.

The first condition is applied on the “group” column to check the values that are equal to “Y”. Then, the “group”  values which are “Y’ and the “pos” is “Bo”. And the last condition that includes the complete conditions states the “group” equal to “Y” and the “pos” is “Bo” and the “scores” are greater than “15”. Retrieve those records from the DataFrame. The “len()”counts the rows and stores them in the “outcome” variable. Utilize the “print()” method to display the output.

The output tells us that there are 2 rows in the DataFrame that meet all three conditions.

Conclusion

Pandas provides us with a variety of very useful and important features. This tutorial is based on the Pandas-provided method. This is the “len()” function to count the number of rows in a given DataFrame. In this learning, our aim and goal is to make you understand how you can count the number of rows that fulfills a defined condition. We explained every step of this technique explicitly verbally as well as with the help of example codes that were implemented on the “Spyder” tool. We put a sincere attempt to make this piece of writing as easy and handy as possible for you to understand the concept.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.