Hence, a chi-square test is an outstanding alternative for assisting us in better understanding as well as evaluating the link between the two categorical variables. Both variables must come from a similar population and be categorical; these variables are then classified as Yes/No, Male/Female, Red/Green, and so on.
When evaluating the tallies and counts of categorized responses among multiple independent groups, the Chi-square test is beneficial.”
Chi-Square Test in R
When the test is completed, the outcome is a “p” value, which you use to determine if your hypothesis of independence is correct or not. The “p” number simply represents the probability that your variables are independent.
If the “p” value is more than 0.05, the likelihood of independence is quite strong and adequate to determine that the factors are unrelated. On the other hand, anything less than 0.05, on the other hand, denotes a negligible chance of independence, and there is a high link between the factors.
You might be questioning why 0.05 and not any other quantity. This figure was devised by statistic researchers and is extensively adopted just because 0.05 is commonly utilized as a defining spot.
To summarize what has been said above:
H0: The variables are not associated with each other, and there is no correlation between them.
H1: The variables are associated with each other.
R programming provides us with a “chisq.test()” function to conduct chi-square testing and evaluate if there exists any relationship between both the variables of the provided data.
The chi-square testing operates in R using the following syntax:
This article will teach you how to run and understand the Chi-square test in R with the examples provided below.
Example # 1
We are starting the implementation of the chi-square test with the simplest and basic example.
In the first step, we used the function “rm()” to remove all the unnecessary objects in case they already exist. Now, the main code starts. We have created two object variables; “x_actual” and “x_predict.” Assign “ x_actual” a list of actual values using the “c()” function in R. While assigning “x_predict” a list of predicted values. Now calling the “chisq.test()” function and passing both actual and predicted values as a parameter of it. By using the “chi” object stored, the values of the chi-square test. The “Print()” statement will simply print the chi-square test result.
Before we interpret the result of the chi-square test, let us introduce you to some terminologies that will be used in the chi-square test result.
“df” are the values that are free to change from the provided variables.
“X-Squared” is the arbitrary variable in the Chi-square test that illustrates the average of the variables’ observed vs. anticipated frequency counts.
“P-Value” expresses the sample’s prospect.
If the p-value is smaller than the significance value, which is 0.05 typically, we may interpret the Chi-square test. If so, we eliminate the NULL HYPOTHESIS and declare that there exists a relationship between the two variables. In other words, one variable can elucidate the other.
The p-value in our scenario is bigger than the stated significance value (0.05). Ultimately, we accept NULL HYPOTHESIS and presume that the variables are autonomous of one another.
Example # 2
In this example, we will use a built-in dataset provided by R base and perform a chi-square test on it. The dataset we are going to utilize is “ChickWeight.” It provides us data on the weight of chicks based on their Diet and the period after birth.
We are conducting this test to see if there is any relationship between the chicks’ Diet and the chick’s weight. R’s built-in function “chisq.test()” elegantly provides you with everything you need to know about the independence of variables in a dataset to determine whether or not they are associated.
We will begin by importing the dataset into R.
The result of the chi-square tests in the image below shows that the value “P-value” of this test is greater than the significant “p-value” which is 0.05, which indicates that the weight of the chicks is independent of their diet. Even though this may appear strange at first since each chick’s weight should be determined by what the chick consumes. Though, this may not be the case in this illustration.
Now, we will compare the weight to another variable, which is “time.” This variable calculates how long it has been ever since the chick was born.
In this segment of code, we just replaced the “Diet” column with the “Time” column as we are now comparing chicks hatching time to their weight for the chi-square test.
In the resultant chi-square test, the value of “p” can be seen, which is very small. It signifies that there is a strong association between the time from when the chicks were born and the weight of the chicks. This means that they start gaining weight as they get older.
Conclusion
Our today’s article revolves around the topic of the chi-square test in R. In the introduction section, we explained the chi-square test, why it is conducted and how it is carried out. We discussed the entire mere concepts included in this topic. After that, we performed 2 practical coding examples in Rstudio in Ubuntu 20.04. Our first example will help you to perform a chi-square test on user-defined variables, while the 2nd example is executed using the built-in dataframe from the R base. We anticipate that this piece of writing will facilitate you in conducting the chi-square test in R programming.