There are several Python libraries that offer various functions and modules to perform simple as well as complex tasks. “Scipy” is one such library in Python that is used for various purposes such as signal processing, image optimization, statistical calculation, etc. The “Pearsonr” corresponds to a function of this library that refers to the “Pearson correlation coefficient” and ranges from “-1” to “1”. We can determine the “Pearson correlation coefficient” utilizing the “pearsonr()” function of the “scipy.stats” module in Python.
This Python post presents a detailed guide on the SciPy Stats “pearsonr()” function using numerous examples.
What is the “scipy.stats.pearsonr()” Function in Python?
The “scipy.stats.pearsonr()” function in Python is a part of the “scipy” library, which is specifically used to calculate the “Pearson correlation coefficient” between two arrays or lists of values. “Pearson correlation coefficients” measure/calculate how reasonably two variables correlate.
Syntax
In the above syntax:
- The “x” and “y” parameters are the input arrays or lists containing the values for which the correlation coefficient needs to be calculated.
- The third parameter named “alternative” specifies a string that defines the alternative hypothesis for the test. It can be “two-sided”, “less” or “greater”.
Return Value
The “scipy.stats.pearsonr()” function returns two values:
- Pearson correlation coefficient: It ranges between “-1” and “1”, where “-1” specifies a perfect negative linear relationship, “1” indicates a perfect positive linear relationship, and “0” corresponds to no linear relationship.
- p-value: The p-value is associated with the hypothesis test for the correlation coefficient.
Example 1: Calculating the “Pearson Correlation Coefficient”
In the below example code, the “stats.pearsonr()” function is utilized to determine the Pearson correlation coefficient:
from scipy import stats
value_1 = numpy.array([65, 70, 68, 61, 72])
value_2 = numpy.array([150, 160, 155, 140, 175])
correlation, p_value = stats.pearsonr(value_1, value_2)
print(correlation)
print(p_value)
According to the above code:
- The “numpy” library and the “stats” module are imported and the two arrays are initialized.
- The “pearsonr()” function takes both the arrays as its arguments and assigns the returned values to “correlation” and “p_value”. This function calculates the “Pearson correlation coefficient” of the given two arrays.
Output
The positive floating numbers of statistics and “P-value” indicate that these two datasets have highly positive coefficients.
Example 2: Calculate the Correlation Coefficient and P-value of Two Random Arrays
Here is an example code that is used to determine the correlation coefficient and p-value:
from scipy import stats
value_1 = numpy.random.rand(10)
value_2 = numpy.random.rand(10)
correlation, p_value = stats.pearsonr(value_1, value_2)
print(f"Correlation coefficient: {correlation:.3f}")
print(f"P-value: {p_value:.3f}")
In the above code:
- The “random.rand()” method generates two random arrays with size “10”.
- The “pearsonr()” function calculates the “Correlation Coefficient” and the “P-Value”.
Output
In the above output, it can be seen that the correlation coefficient value is retrieved as “-0.106” which specifies the weak negative correlation, and the p-value value “0.770” specifies the null hypothesis as “True”.
Conclusion
The “scipy.stats.pearsonr()” function in Python is a powerful tool for calculating the “Pearson correlation coefficient” between variables. It delivers a convenient/simple way to measure the linear relationship between two arrays or a list of values. This blog presented a thorough guide on Python’s “stats.pearsonr()” function of the “scipy” library using appropriate examples.