Python

SciPy KS test

Python is recognized as the computer’s programming language which enables it to write the different sorts of computer software and programs. This language has high-performance level criteria among all the other languages and its package is not just limited to some of the specific program or the software tasks. Rather, it is known as an all-purpose language that can be used to write any program, be it a mathematical program, matrices and their operations, implementation of the differential equations or training a machine learning model, and the training of an artificial neural network. For every task, your Python name provides it under one platform. Scipy is from Python’s programming language. It has modules that import the necessary information to the program for the functions that are built for machine learning and deep learning models. SciPy offers one such function like the SciPy “KS test”.

The KS test is recognized as the “Kolmogorov-Smirnov test” under which we find out from what kind/type of distribution is the sample under the test comes. There are two methods for conducting such tests based on the number of samples that we give to this test as its input parameters.

Procedure:

KS test with its two types of functions will be explained and demonstrated practically in this article. This article gives a good background knowledge on the introduction and the functionality of the KS test. Then, it explains the method to write these functions in the Python script with the discussion on its parameters that belong to the input argument list of both functions.

Syntax:

Since we already know from the previous explanation that the KS test is of two types in nature, the functionality of both these functions is the same but they slightly differ in their configuration in terms of their argument list. One of the two KS tests is known as the simple “KS test”. It only takes one sample of data and carries out the test for this data. The second one is the “ks_2sample test”. This test carries out the same KS test but for the two different sample data. The syntax for both the ks test() and ks_2 test() are given respectively in the following:

$ scipy. stats.kstest()

$ scipy. stats.ks_2samp()

Return Value:

Both the previously-mentioned functions return the same type of result. They both return two values – one is the “statistical one” and the other one is the “p-value” – where the p-value is the main decision-making, whether the samples belong to the same distribution or not.

Example 1:

Suppose we have a data sample that belongs or was generated with some distribution. Now, with the KS test, we want to know from which distribution does this data belong. We assume a null type of hypothesis which states that the sample data comes from the normal distribution and we are 95 percent confident on our hypothesis. In another case, which is the alternative case, we have the option to reject the null hypothesis if the p-value that the KS test returns will have the value less than the “0.05”.

The null hypothesis is rejected if the result falls below 0.05, indicating that the random sample didn’t even come from a normal distribution. Let’s carry out a KS test for the sample data that we will generate uniquely for this example. Keep in mind that the Python platform in which we will write the program for this example is the “Google Collab”.Open the new notebook in the collab and then start writing the program. We import the “NumPy” library to use its module to define the data with a distribution.

Integrate this Numpy package as “np” in the program.  The second library to add to the program is “stats” which is a module from the Scipy library. Import them from the SciPy library stats which then adds the modules in the program that is used for the working of the kstest() function. Allocate a variable assuming “x”. Assign the value of the sample data to this by calling the stats to attribute as “stats.norm.rvs(size=100, random_state=rng)”.

With this call, we define the data of the random variable using the stats norm.rvs function to normally distribute this data. The null hypothesis is that this sample is from the normal distribution. The size of this data is specified as “200”. To check for the results of the kstest() function on this data, we pass on this data to the parameter of the kstest() function as “stats. kstest(x)”. The output for the following program is as follows:

from scipy import stats

a = stats.norm.rvs(size=200)

rslt=stats.kstest(a,stats.norm.cdf)

print(rslt)

The p-value of the KS test is greater thsan 0.05. So, we are unable to reject the hypothesis that the sample data belongs to the normal standard distribution.

Example 2:

Now, we conduct another type of the KS test which is the “ks_2test”. It takes in the two sample data and comments if both samples belong to the same distribution or not. Import the “stats” module from the SciPy library and declare two sample data using the “stats. norm. rvs(size=115)” and “stats.norm.rvs(size=105)” methods, respectively. Save them as “data1” and “data2”. This generates the two data having the sizes of “115” and “105” with a normal distribution. The null hypothesis is that both of these data are from the same type of “normal standard” distribution. To check this out, feed these two data to the “ks_2 test ()” as “stats.ks_2samp(data1, data2)” and check the p-value. The program and the outputs are mentioned in the following:

The value of the p is not less than 0.05. So, the null hypothesis that we assumed for this data is right and we have no right to reject it.

data1 = stats.norm.rvs(size=115)

data2 = stats.norm.rvs(size=105)

rslt=stats.ks_2samp(data1, data2)

print(rslt)

Conclusion

We conducted the two KS tests using the different functions that the KS test offers for one sample as “stats.ks test()” and for the two data samples as “ks_2samp()”. Based on the results of the functions, we decided which data sample belongs to which type of distribution. In both cases, it came out to be the standard normal type distribution.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content