Python

Pandas rolling().apply() Function

Rolling() is one of the many helpful functions offered by the Pandas toolkit, which is extraordinary in its ability to carry out complicated computations on datasets. To apply a particular function with a rolling window to the whole data, we have a method “rolling().apply()”.

This can be used for the DataFrame, as well as for the Series. To invoke this function, we have to follow this syntax:


This article will carry out the practical execution of this function with two techniques in Python programs.

Example # 1: Utilizing pandas rolling().apply() Function To Calculate the Median of a pandas DataFrame

For this illustration, we will employ the pandas “df.rolling().apply()” method to apply a method to the whole DataFrame with the rolling window. We will directly move to the implementation of this concept.

The compilation of the Python program is done on a tool compatible with our system. It provides the best Python-supported environment, so we won’t get stuck in the execution process. The one we have picked is the “Spyder” tool. We will use this tool to assemble our example programs in this tutorial. We just launched the tool to work on it, and it will take you straight to the working environment. We would start typing the script on it.

Now, the primary requisites are loaded into this file. So, we imported the two required modules of Python. These are “Pandas” and “NumPy”. The pandas library is imported because we will create a DataFrame using this library, and the “rolling().apply()” function belongs to this same toolkit. The second library, NumPy, is loaded to a method; from it that will be used to compute a mathematical expression. So, both these libraries have been introduced into the program. Additionally, abbreviations, such as “pd” and “np”, respectively, have been set for them.

Using a function from the pandas library, we need to construct a pandas DataFrame. The method for DataFrame creation is “pd.DataFrame()”. When we exercise this function, it instructs the Python program to begin the DataFrame generation process. Thus, we invoked the “pd.DataFrame()” method and supplied specified values for each column that had to be appended in it. We want this DataFrame to have four columns; hence, we named them “East”, “West”, “North”, and “South”.

The data type for all the columns is kept the same, which is integer dtype, and the size of the columns is 8. The values identified for the “East” column are “1, 2, 3, 4, 5, 6, 7, and 8”. The “West” holds values “0, 1, 2, 3, 4, 5, 6, and 7”. The data stored in the “North” column is “9, 8, 7, 6, 5, 4, 3, and 2”. And the last column, “South”, has the values “3, 5, 7, 9, 6, 4, 2, and 8”. A DataFrame object, “Directions”, is constructed to hold this DataFrame so it can be made available to be used later.

We need to put this DataFrame on view. This can be done using the Python function “print()”. This method takes any type of input, whether a string, function, object, variable, or some expression, and simply puts the result on the screen. So, we provided the “Direction” DataFrame object to print it.


This is the resultant DataFrame that we constructed by executing the previous code snippet.


Now to compute the “rolling().apply()” on this DataFrame we have defined a function. The “def” is the keyword that defines a function. The function we have created is “ApplyRolling():”. Within this method, another function is defined, which is “median_estimation(i):”. You might have observed that we have written: “i” between the braces of this function. This “i” is used for the iterations because the “median_estimation()” function will work iteratively on every value in the DataFrame. Now, what will it perform with each iteration? In the following line, we have specified it to return us an output of the “np.median(i)” function.

As the name identifies, it is NumPy’s function that will calculate the median iteratively for all values. Each value will be replaced by “i”; the median will be estimated for that particular value and return the result. Then invoke the “df.rolling().apply()” method. The name of the DataFrame is mentioned as “Directions”, the “rolling()” function is set to find the rolling window for three observations, and the “apply()” function is exercised with the “median_estimation” function. The output will be kept in the variable “Outcome”. And then, we returned this variable “Outcome” to the main function.

So this whole process can be summed up as the “ApplyRolling()” function starts and asks the “median_estimation()” function to calculate the median iteratively. Then it gets in the DataFrame, computes the rolling window, applies the median method to generate the output, and it goes out of the main function. And “print()” will exhibit the result from the “ApplyRolling()” function.


Here, we can see that the first three entries have NaN values because we have specified to estimate the rolling window for three observations since they do not have any more entries above them to complete the window, so they are left empty. From the fourth entry, the median was calculated for the rolling window. The returned datatype of this DataFrame is float.

Example # 2: Utilizing pandas rolling.apply() Function To Compute the Sum of a pandas Series

The second technique of employing this method is exercising it on the pandas series this time.

We first imported the pandas library into our Python environment and made “pd” its alias to be used throughout the program. Then we invoked the “pd.Series()” function from the pandas toolkit to create a pandas series. The “pd” is the pandas’ abbreviation defined above, and the “Series” is the keyword for building a series. We have specified an array of values for this function. The values are “11, 12, 13, 14, 15, 3, 7, 17, 9, 1, 21, 24, and 2”. To preserve this series, we have a series object, “Random”. This “Random” object is now holding our series, so we passed it to the “print()” method to present its content on the terminal.


When we press the “Run file” option, our code gets executed. In the following snapshot, you can see the currently generated series.


After printing the series, we have defined a function “Series_RA()”. Now, the main calculations will be performed in this function. Within this function, we have defined another function which is “compute_sum(j)”. You can guess by its label that it will obtain the sum. The “j” here tells it to perform this function iteratively. Each iteration will move in and perform the “sum(j)” function. And then return the calculated sum for each value.

Then the “Series.rolling().apply()” method is called within the “compute_sum()” function. It will estimate the find rolling window for five observations and then calculate their sum using the “apply()” method. The output will be preserved in the “Result” local variable and returned to the main function. Lastly, we have passed the input “Series_RA()” function to the “print()” method to see the final series.


This is the ultimate series with a computed sum on a rolling window of five observations.

Conclusion

The discussed topic for this article was based on the two concepts of pandas, which are “rolling()” and “apply()”. We have elaborated on the utilization of this method by giving you a detailed explanation of the practical implementation of the “rolling().apply()” function. This tutorial contained two approaches for employing this method. The first instance talked about applying a median function from the NumPy toolkit on the rolling window of a Pandas DataFrame. Whereas the second example gave us the idea to implement this concept on a pandas series. So, we applied this function by calculating the sum on the specified rolling window of the series. You can exercise the technique which is more feasible for your task.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.