Python

# Scipy Chi-Square

Python is a well-known high-performance level programming language that allows various engineering, technical, object-oriented, and mathematical functions, and operations. Python provides numerous libraries for its users and one of the most renowned and open-source programming languages is “Scipy”. Scipy library contains information and applies its operations to different machine learning methods e.g., machine learning algorithms and the optimizers that we use in artificially intelligent systems. We use a module chi-square from the Scipy library to take out the various test for the categorical data.

## Procedure

In the article, we will use the chi-square function from the scipy library and will take out several tests for the data. The information regarding the use of this function in python script will also be discussed and applied to various examples in this article.

## Syntax

Since we will be carrying out the two tests to check for the relation between the variables using the chi-square method, we will discuss the two different syntaxes for this function.

Chi square Test -> \$ chisquare(f_obs, ddof=1, f_exp=None, axis=1)

Chi square Independence Test -> \$ chi2_contingency(observed, lambda_= None correction=False)

## Return Value

This function returns the p-value and the chi square test statistic value in its output.

## Example # 01

We will conduct the first test using the chi square that we have discussed in the syntax as the chi_square test. This test tells us about the relationship between the categorical variables. This function uses the attribute “stats” module from the scipy. Stats compute the test where we have to assume a null hypothesis and the chi square tests this null hypothesis whether or not the data contains the specified frequencies. It takes the observed and the expected frequencies as its parameters and if these two frequencies are less than or not at least “five” then the test is considered invalid.

We will execute the same test in the example, the compiler that we will be using to implement the example is” Google Collab” which serves as an open-source compiler. It runs the program without any installments of the library packages since it already has the installed packages in it. After opening the compiler, simply create a project with a unique name and then move to the next step where we will import the library packages to run the chi square test.

The chi square test is given in the python scipy stats module so we will import the “stats” from the scipy into our project. Once we have imported this library, we will define an array having the elements as the frequencies at least greater or equal to five. So, we are going to define the array with the name “array” with the observed frequencies as “ [ 3, 4, 6, 8, 10, 2] “. After successfully creating this array, we will call the chi square function with the prefix “stats” as “stats. chisquare ()”. To the input argument of this function, we will pass the array which is the f_obs ( observed frequency) the parameter of the chi square function.

After running the code mentioned in the snippet below, we will get the output as the chi square statistic and the p-value. This chisquare test will be valid since we have defined at least five observed frequencies in the parameter of this function.

from scipy import stats
array= [3,4,8,10,12]
stats.chisquare(array)

## Example # 02

The chi square function has another test for the categorical variables with the name the “ chi_square test for the Independence “. Now, this test is slightly different from the chi-square test that we have discussed in the example above since this test checks out if there exists any significant relationship between the two variables of the categorical class. For this test, chi_square uses the stats module’s function “chisquare contingency” from the python scipy. The test is conducted to show whether the variables are independent of each other or not. To work with this function, let’s create another new array. To create and define this new array, we will have to import the numpy with the name “np” so that we can call the “np” in the code later in place of the numpy.

Another important library package that needs to get imported for the proper working of the chi square contingency test is the “stats”. We will import the stats module from the scipy library from the python language. With the successful import of the required libraries for the project or the chi square independence test, in the next step, we will define an array with the name “obs_array”. We will use “np. array ()” method to create this array and the dimensions of the array will be set to 2-dimensional and the elements that this array will contain will be as “([2, 2, 2],[8, 8, 8])”. This array will be used as the observation which is the input parameter for the chi_2 contingency () function.

To call the contingency test for the independence of the two variables belonging to the categorical classes, we will use the prefix stats and will call the function chi_2contingency as “stats.chi2_contingency(obs_array) “. We will execute the following code that is shown in the figure to get the results for the independence test of the chi square contingency function.

import numpy as np
from scipy import stats
obs_array= np.array([[2, 2, 2],[8, 8, 8]])
stats.chi2_contingency(obs_array)

This function returns the chi square test statistics value, the p-value of the test, and the values of those frequencies which are expected ones for the observations that we pass in the parameters of the function. The output of the function is displayed in the figure below. ## Conclusion

The article gives detailed information on the chi square method which uses the “stats” module from the scipy library. We have conducted two types of tests for the chi square methods by implementing the two different examples in the python script and have shown what the output or the return values of this function looks like and what they mean in the context of the scipy chi square () method. 