Chi-Square Test
TL;DR: The chi-square test is a statistical method for analyzing categorical data to see if observed patterns match expectations or show a relationship. In this guide, you will learn the main types, assumptions, and calculation steps. You will also discover how to interpret results and explore practical applications in research and data analysis.

Analyzing relationships between categorical variables is a common challenge in statistics. Simply looking at raw counts or percentages often does not reveal whether patterns are meaningful or just due to chance. To address this, statisticians use the chi-square test, a method that assesses whether observed frequencies differ significantly from expected frequencies.

At a high level, the chi-square test works through a clear sequence of steps:

  • Compare what you observe with what you would expect
  • Calculate the chi-square value to see the difference
  • Check the result against a critical value or p-value
  • Decide if the pattern is significant
  • Use this to conclude the relationship between variables

In this article, you will learn what the chi-square test is and why it is used. You will also see its types, assumptions, calculation steps, applications, and limitations.

What is the Chi-Square Test?

The chi-square test is a non‑parametric statistical test used for categorical data. It helps determine whether the distribution of data across categories deviates from expectations. Being non-parametric, it does not rely on assumptions about the underlying population distribution.

This test works by measuring the difference between observed and expected frequencies. A large difference suggests that the observed data do not fit the expected pattern, indicating a possible relationship between the variables being studied. The chi-square test statistic is calculated using the formula:

x2= (O-E)2E

Here, O represents the observed frequency, and E is the expected frequency.

Data Analyst CourseExplore Program
Want to Become a Data Analyst? Learn From Experts!

Why Use a Chi-Square Test?

Now that you know what the chi-square test is, it is helpful to understand the situations where it is most effective:

  • Test if Distributions Match the Expected Pattern

The Chi-square test can assess whether the distribution of a single categorical variable aligns with expected values. For example, it can determine whether survey responses or product preferences are evenly distributed across categories or exhibit significant deviations.

  • Test for Association Between Categorical Variables

It also helps examine whether two categorical variables are related. For instance, it can reveal whether gender influences product choice by comparing observed frequencies with those expected under independence.

Did You Know? Inferential statistics enable analysts to test hypotheses and draw conclusions about populations based on sample data, using techniques like t-tests, chi-square tests, and analysis of variance (ANOVA). (Source: PMC)

Types of Chi-Square Tests

So you have seen what a Chi-square test is used for. Now let’s explore the main types and how each is applied in analyzing categorical data:

  • Chi-Square Goodness‑of‑Fit

This test checks whether a single categorical variable fits a theoretical distribution. It compares the observed frequencies with the expected frequencies and determines if the differences are significant. 

  • Chi-Square Test of Independence

This type is used to determine whether two categorical variables are related or independent. For example, it can test whether gender influences product preference by comparing observed data with what would be expected if the variables were unrelated.

  • Chi-Square Test of Homogeneity

The homogeneity test compares distributions across different populations. It helps determine whether separate groups share the same distribution for a categorical variable, which is useful in data analysis from multiple sources or regions.

Data Analyst CourseExplore Program
Your Data Analytics Career is Around The Corner!

Assumptions of the Chi-Square Test

When you calculate chi-square, there are several assumptions to keep in mind to ensure valid results. Let’s look at the main requirements:

  • Observations Must Be independent

Each observation should come from a separate subject or case. No single data point should influence another, as dependence can distort the test results.

  • Expected Frequency Usually ≥5 Per Cell

As a general rule of thumb, every cell in the contingency table should have an expected frequency of at least 5. Smaller expected counts can make the chi-square approximation less accurate.

  • Variables Must Be Categorical

Chi-square tests are designed for categorical data. Both variables should represent categories or groups rather than continuous measurements.

Step‑by‑Step Chi-Square Calculation (With Example)

By now, you have seen what the chi-square test is, its main types, and the assumptions behind it. Let’s now understand how the calculation works through two simple chi-square test examples.

Example 1: Chi-Square Goodness-of-Fit Test

Suppose a store expects customer purchases to be evenly distributed across four product categories. Based on this assumption, the expected count for each category is 25.

Here is the sample dataset:

Product Category

Observed (O)

Expected (E)

Electronics

20

25

Clothing

30

25

Grocery

25

25

Home Goods

25

25

Now calculate the chi-square value for each category using:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O-E)^2}{E}χ2=∑E(O−E)2​

For each category:

  • Electronics: (20−25)2/25=1(20 - 25)^2 / 25 = 1(20−25)2/25=1
  • Clothing: (30−25)2/25=1(30 - 25)^2 / 25 = 1(30−25)2/25=1
  • Grocery: (25−25)2/25=0(25 - 25)^2 / 25 = 0(25−25)2/25=0
  • Home Goods: (25−25)2/25=0(25 - 25)^2 / 25 = 0(25−25)2/25=0

Add them together:

χ2=1+1+0+0=2\chi^2 = 1 + 1 + 0 + 0 = 2χ2=1+1+0+0=2

This chi-square value shows how much the observed counts differ from the expected distribution.

Example 2: Chi-Square Test of Independence

Now suppose you want to check whether buying preference is related to gender.

Here is the sample dataset:

Gender

Buys Product A

Buys Product B

Row Total

Male

30

20

50

Female

20

30

50

Column Total

50

50

100

First, calculate the expected count for each cell using:

E=Row Total×Column TotalGrand TotalE = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}E=Grand TotalRow Total×Column Total​

So for each cell:

  • Male, Product A: (50×50)/100=25(50 \times 50)/100 = 25(50×50)/100=25
  • Male, Product B: (50×50)/100=25(50 \times 50)/100 = 25(50×50)/100=25
  • Female, Product A: (50×50)/100=25(50 \times 50)/100 = 25(50×50)/100=25
  • Female, Product B: (50×50)/100=25(50 \times 50)/100 = 25(50×50)/100=25

Now apply the chi-square formula to each cell:

  • (30−25)2/25=1(30 - 25)^2 / 25 = 1(30−25)2/25=1
  • (20−25)2/25=1(20 - 25)^2 / 25 = 1(20−25)2/25=1
  • (20−25)2/25=1(20 - 25)^2 / 25 = 1(20−25)2/25=1
  • (30−25)2/25=1(30 - 25)^2 / 25 = 1(30−25)2/25=1

Add them together:

χ2=1+1+1+1=4\chi^2 = 1 + 1 + 1 + 1 = 4χ2=1+1+1+1=4

This value tells you how much the observed frequencies differ from the frequencies you would expect if gender and buying preference were independent.

Interpretation of Results

Once you calculate the χ2\chi^2χ2 statistic, compare it with the critical value for your chosen significance level or check the corresponding p-value. This helps you decide whether the observed difference or association is statistically significant.

  • If p-value < 0.05: reject the null hypothesis
  • If p-value ≥ 0.05: fail to reject the null hypothesis

In simple terms, rejecting the null hypothesis means there is a statistically significant difference or association, while failing to reject it means there is not enough evidence to say the difference or association is significant.

Chi-Square Distribution

The chi-square distribution is used to interpret the results of the chi-square test. It is right-skewed, meaning it has a long tail on the right side, and its shape depends on the degrees of freedom (df). As the degrees of freedom increase, the distribution becomes more symmetric and spreads out over a wider range of values.

Critical values in the chi-square distribution help decide whether to reject the null hypothesis. For a given significance level (commonly 0.05), any test statistic above the critical value indicates that the observed difference is unlikely to have occurred by chance.

Here’s a simple chi-square analysis example showing how χ² values correspond to p-values for a few degrees of freedom:

Degrees of Freedom (df)

χ² Critical Value (α = 0.05)

p-value Interpretation

1

3.84

χ² > 3.84 → reject H₀

2

5.99

χ² > 5.99 → reject H₀

3

7.81

χ² > 7.81 → reject H₀

Chi-Square

The degrees of freedom indicate how many values in the data can vary independently, which affects the distribution and the critical value.

Data Analysts are shaping the future, and this is your chance to become one of them with our Data Analyst Course.

Application of the Chi-Square Test

From the chi-square examples above, you have seen how the test works and what it measures. Now let's look at the application of the Chi-Square test:

  • Market Research

Chi-square tests are useful in analyzing customer surveys to determine if preferences differ across regions, age groups, or other categories. Companies can identify patterns in consumer behavior and make data-driven marketing decisions.

  • Medical Studies

In healthcare, the chi-square test is often used to compare outcomes between treatment and control groups. It helps determine whether differences in patient recovery rates are due to chance or reflect a real effect. Let’s take an example of the chi-square test where 60 patients recover with one treatment and 45 with another. The test can show if this difference is meaningful or just random.

  • Social Science Research

Researchers use chi-square tests to determine whether there’s a link between categories, such as education level and voting, or gender and program participation. It helps make sense of survey data and spot trends in society.

Limitations of the Chi-Square Test

When using the chi-square test, it is important to understand its limitations to ensure reliable interpretation. Here are the main constraints to consider.

  • Not Suitable for Small Expected Counts

The chi-square test works best when each cell has enough data, usually at least 5 counts. If there are too few, the results might not be accurate. In that case, Fisher’s Exact Test is a safer option.

  • Only for Categorical Data

Chi-square only works with data in separate categories. Continuous numbers need to be grouped, which can hide some details or patterns. Carefully making these groups helps keep the results meaningful.

  • Sensitive to Sample Size

The sample size makes a big difference in chi-square tests. With a large sample, even small differences can look significant. With a small sample, real links might be missed. It’s important to keep sample size in mind when interpreting the results.

Step into one of the most demanding careers in 2026, and become a professional Data Analyst in 11 months.

Conclusion 

The chi-square test is a practical statistical method for analyzing categorical data and assessing whether observed patterns are meaningful or merely due to chance. Whether you are testing how well data fits an expected distribution or examining the relationship between variables, it gives you a structured way to make evidence-based conclusions. By understanding its types, assumptions, calculation steps, applications, and limitations, you can use the chi-square test more confidently in research, business, healthcare, and social science analysis.

If you want to build stronger skills in statistics, data analysis, and real-world decision-making, explore Simplilearn’s Data Analytics Courses. These programs can help you learn the tools and techniques needed to apply statistical concepts, such as chi-square testing, in practical, data-driven roles.

Key Takeaways

  • Chi-square helps identify patterns in categorical data and shows whether results differ from what you expect
  • There are different types, like goodness-of-fit, independence, and homogeneity, depending on the data
  • The step-by-step process, from counts to the test value, keeps the analysis clear and easy to follow
  • Knowing where it works well and where it doesn’t helps you use it correctly and interpret results properly

With Our Unique Data Analyst CourseExplore Program
Become an Expert in Data Analytics

FAQs

1. What are expected and observed frequencies with the Chi-Square test?

Observed frequencies are the real numbers that your data has provided in a Chi-Square test, and expected frequencies are the numbers that you would have had provided if there was no difference, pattern, or relationship. The test is used to compare the two values and determine the extent to which the data would not differ from the expected situation. The bigger the difference between observed and expected frequencies, the greater the Chi-Square statistic.

2. What is the distinction between the chi-square test of independence and goodness-of-fit?

Chi-Square goodness-of-fit test is used to assess whether a one categorical variable is distributed as expected. For example, it can examine the even distribution of customer preferences across product categories. The Chi-Square test of independence, on the other hand, is used to test the relationship between two categorical variables: gender and product choice. Simply put, goodness-of-fit is concerned with a single variable, whereas independence is concerned with a two-variable relationship.

3. What does the Chi-Square test mean by a contingency table?

A contingency table is used to show the frequency distribution of two categorical variables. It arranges the data in rows and columns so you can compare interactions among the categories. When performing a Chi-Square test of independence, the contingency table is calculated to get the expected count in each cell, and the Chi-square test is then performed to determine whether the variables are independent of each other or not.

4. What are the Chi-Square test alternatives?

The chi-square test is not always the best choice when the assumptions are not met, particularly when expected frequencies are very small. Another standardized test used when the sample size is small or when there are 2x2 tables is Fisher's Exact Test. In other situations, one can also use tests such as the McNemar Test or G-tests, depending on the structure of the data and the research problem. The appropriate alternative will depend on sample size, table design, and whether the data points are independent.

About the Author

Avijeet BiswalAvijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

View More
  • Acknowledgement
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, OPM3 and the PMI ATP seal are the registered marks of the Project Management Institute, Inc.
  • *All trademarks are the property of their respective owners and their inclusion does not imply endorsement or affiliation.
  • Career Impact Results vary based on experience and numerous factors.