In Python, pandas supports the **corr()** function, which generates a correlation matrix with correlation coefficients included. In this guide, we will discuss how to generate a correlation matrix from the pandas DataFrame using this function and discuss different parameters that are passed to this function with separate examples. For visualization of the correlation matrix, we will demonstrate how to create HeatMap.

The correlation coefficient varies between -1 to +1. Two variables are positively correlated if the correlation coefficient is greater than 0. When the correlation coefficient is equal to 0, there is no correlation between the two variables. Two variables are negatively correlated if the correlation coefficient is less than 0.

**pandas.DataFrame.corr()** function is used to compute the pairwise correlation of DataFrame columns. It will not consider the missing values that exist in the DataFrame.

**Syntax**

Let’s see the syntax and the parameters passed to this function.

- There are different correlation methods that exist mathematically. This
**method**parameter will take the correlation method. It can take ‘kendall’, ‘spearman’ and ‘pearson’. By default, it will consider ‘pearson’ as the correlation method. - The minimum number of observations required per pair of columns to have a valid result is set by the
**min_periods**parameter (default = 1). ‘Kendall’ method is not used along with this parameter. - The
**numeric_only**(default = False) include numeric data columns like int, float.

## Dataset

In this entire guide, we will use the houses.csv file that holds seven columns and six records. There are five numeric columns [‘beds’, ‘baths’, ‘size’, ‘lot_size’, ‘price’] and two non-numeric columns [‘size_units’, ‘lot_size_units’]. We won’t consider these two columns while creating the correlation matrix.

Let’s import this into the pandas DataFrame.

# Import houses.csv file into house_df DataFrame.

house_df = pandas.read_csv('houses.csv')

print(house_df)

print(Marks.quantile())

**Output**

Now this DataFrame holds seven columns with six records.

## Example 1: No Parameters

Let’s compute the correlation matrix for the house’s DataFrame with Pearson correlation as the default method.

house_df = pandas.read_csv('houses.csv')

# Correlation matrix for the above DataFrame

print(house_df.corr())

**Output**

A correlation coefficient matrix is generated for five columns. Diagonal positions in the matrix are positively correlated, as they are equal. Basically, the correlation matrix holds the correlation coefficients. We will say

- The correlation between beds and baths is 0.834058, indicating a strong positive correlation.
- The correlation between beds and size is 0.799378, showing a strong positive correlation.
- There is a negative correlation between beds and lot_size is -0.170355.
- The correlation between beds and price is 0.666335, signifying a moderate positive correlation.

This way, you can compare the correlation for all the columns.

## Example 2: Method Parameter

First, we will return the correlation matrix with kendall and then spearman.

house_df = pandas.read_csv('houses.csv')

# Correlation matrix - kendall

print(house_df.corr(method='kendall'),"\n")

# Correlation matrix - spearman

print(house_df.corr(method='spearman'),"\n")

**Output**

The correlation coefficients are different in both the matrices. The coefficients in the ‘spearman’ methods are greater than the coefficients in the ‘kendall’ method.

## Example 3: Filtering Positive Correlation Coefficients

In this example, we will first filter the results that are positively correlated (greater than 0), then we will get the data that are strongly positively correlated (greater than 0.8).

house_df = pandas.read_csv('houses.csv')

# Positive Correlations from the Correlation matrix

corr_matrix= house_df.corr(numeric_only=True,).unstack()

print(corr_matrix[corr_matrix > 0],"\n")

# Strong Positive Correlations from the Correlation matrix

corr_matrix= house_df.corr(numeric_only=True,).unstack()

print(corr_matrix[corr_matrix > 0.8])

**Output**

## Example 4: Filtering Negative Correlation Coefficients

Let’s filter the results that are negatively correlated (less than 0).

house_df = pandas.read_csv('houses.csv')

# Negative Correlations from the Correlation matrix

corr_matrix= house_df.corr(numeric_only=True,).unstack()

print(corr_matrix[corr_matrix < 0],"\n")

**Output**

Only beds are negatively correlated with lot_size.

## Example 5: Visualizing the Correlation Matrix Using Heatmap

Utilize the Heatmap from the seaborn library and plot it using the pyplot from the matplotlib library.

- Pass the pandas.DataFrame.corr() function to the heatmap.
- Sett the
**annot**parameter to True for displaying the coefficient value in each grid. - The
**fmt**parameter is used to format the coefficient values to 1 decimal points. We have to set it to display one decimal point. - The
**cmap**parameter is set to the color map named “crest.”

import seaborn

import matplotlib

house_df = pandas.read_csv('houses.csv')

seaborn.heatmap(house_df.corr(numeric_only=True),fmt=".1f",annot=True,cmap="crest")

matplotlib.pyplot.show()

**Output**

You can see the color scale that represents the correlation coefficients, and also on each grid, you will see the coefficient.

## Example 6: Filtering Negative Correlation Coefficients

The NumPy module supports the **corrcoef()** function, which will return the correlation coefficient (Pearson product-moment) for two variables. It will only take two variables (DataFrame columns).

- Return Pearson product-moment correlation coefficient for ‘beds’ and ‘baths.’
- Return Pearson product-moment correlation coefficient for ‘beds’ and ‘size.’

import pandas

house_df = pandas.read_csv('houses.csv')

# Return Pearson product-moment correlation coefficient

print(numpy.corrcoef(house_df['beds'], house_df['baths']),"\n")

print(numpy.corrcoef(house_df['beds'], house_df['size']),"\n")

**Output**

The correlation coefficient for beds and baths is 0.83405766, and for beds and size is 0.79937773.

## Conclusion

We saw how to compute the correlation matrix from the pandas DataFrame using the **pandas.DataFrame.corr()** function. All the parameters that are passed to this parameter are discussed in a separate example. It is possible to visualize the correlation between different variables using the Heatmap. We also discussed how to filter positive and negative correlation coefficients with examples.