“Rolling correlations are obtained by calculating the correlations among two time series using a rolling window. We can identify if two correlated time series diverge from one another over time using rolling correlations.”
Finding the rolling correlation on a Pandas DataFrame can be done using the “DataFrame_object.rolling().corr()” method. In this illustration, we will learn to compute the rolling correlation on a Pandas DataFrame with the basic technique.
Syntax:
On two DataFrames:
(OR)
On two columns in a DataFrame:
The important thing to remember while specifying the values for the columns is that the length of the values for all the columns which are contained in the DataFrame must have to be equal. If we put an unequal length of values, the program will not execute.
Example 1: Correlate Column1 vs Column2
Let’s create a DataFrame with 3 columns and 10 rows and correlate the quantity with the cost column for 2 days.
# Create pandas dataframe for calculating Correlation
# with 3 columns.
analytics=pandas.DataFrame({'Product':[11,22,33,44,55,66,77,88,99,110],
'quantity':[200,455,800,900,900,122,400,700,80,500],
'cost':[2400,4500,5090,600,8000,7800,1100,2233,500,1100]})
# Correlate quantity with cost column for 2 days.
analytics['Correlated']=analytics['quantity'].rolling(2).corr(analytics['cost'])
print(analytics)
Output:
0 11 200 2400 NaN
1 22 455 4500 1.0
2 33 800 5090 1.0
3 44 900 600 -1.0
4 55 900 8000 NaN
5 66 122 7800 1.0
6 77 400 1100 -1.0
7 88 700 2233 1.0
8 99 80 500 1.0
9 110 500 1100 1.0
The correlation for 2 days, 200 to 400, is NaN and so on which are placed in the “Correlated” column.
Example 2: Visualization
Let’s create a DataFrame with 3 columns and 5 rows and correlate the “Sales” vs “Product_likes”.
Use the Seaborn to view the correlation in a graph and get the Pearson correlation coefficient.
import seaborn
from scipy import stats
# Create pandas dataframe for calculating Correlation
# with 3 columns.
analytics=pandas.DataFrame({'Product name':['tv','steel','plastic','leather','others'],
'Product_likes':[100,20,45,67,9],
'Sales':[2300,890,1400,1800,200]})
print(analytics)
print()
# See the coefficient of correlation
print(stats.pearsonr(analytics['Sales'], analytics['Product_likes']))
print()
# Now see the Correlation Sales vs Product_likes
seaborn.lmplot(x="Sales", y="Product_likes", data=analytics)
Output:
0 tv 100 2300
1 steel 20 890
2 plastic 45 1400
3 leather 67 1800
4 others 9 200
(0.9704208315867275, 0.006079620327457793)
Now, you can see the correlation between Sales and Product_likes.
Let’s now get the rolling correlation for these two columns for 3 days.
Code for Example 2:
analytics['Correlated']=analytics['Sales'].rolling(3).corr(analytics['Product_likes'])
print(analytics)
Output:
0 tv 100 2300 NaN
1 steel 20 890 NaN
2 plastic 45 1400 0.998496
3 leather 67 1800 0.999461
4 others 9 200 0.989855
You can see that these two columns are highly correlated.
Example 3: Different DataFrames
Let’s create 2 DataFrames with 1 column each and correlate them.
import seaborn
from scipy import stats
analytics1=pandas.DataFrame({ 'Sales':[2300,890,1400,1800,200,2000,340,56,78,0]})
analytics2=pandas.DataFrame({'Product_likes':[100,20,45,67,9,90,8,1,3,0]})
# See the coefficient of correlation for the above two DataFrames
print(stats.pearsonr(analytics1['Sales'], analytics2['Product_likes']))
# Correlate Sales with Product_likes DataFrame
print(analytics1['Sales'].rolling(5).corr(analytics2['Product_likes']))
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 0.970421
5 0.956484
6 0.976242
7 0.990068
8 0.996854
9 0.996954
dtype: float64
You can see that these two columns are highly correlated.
Conclusion
This discussion revolves around calculating the rolling window and then finding the correlation of a Pandas DataFrame. To put both these concepts into practice, Pandas offers a practical “DataFrame.rolling().corr()” method. For the learner’s convenience to understand the process better, we have given three practically implemented examples along with visualization and Searborn module. Each example is drawn-out with a detailed explanation of the steps. You can either apply it to different columns in a single DataFrame or you may use the same columns from different DataFrames; it all depends on your requirements.