Chi-Squared Tests

Chi-squared tests are statistical tests that assess the association between categorical variables or examine the goodness-of-fit of observed data to an expected distribution. These tests are widely used in various fields to analyze contingency tables and assess whether the observed frequencies differ significantly from the expected frequencies.

Types of Chi-Squared Tests:

  1. Chi-Squared Test of Independence:
  • Objective: To determine whether there is a significant association between two categorical variables.
  • Data Requirement: The data are collected in a contingency table.
  • Null Hypothesis ((H_0)): The variables are independent.
  • Alternative Hypothesis ((H_1) or (H_a)): The variables are not independent.
  • Test Statistic: ( \chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}} )
  • (O_{ij}) is the observed frequency in cell ((i, j)), and (E_{ij}) is the expected frequency in cell ((i, j)).
  1. Chi-Squared Test of Goodness-of-Fit:
  • Objective: To assess whether observed categorical data follow a specified theoretical distribution.
  • Data Requirement: The observed and expected frequencies for different categories.
  • Null Hypothesis ((H_0)): The observed data fit the expected distribution.
  • Alternative Hypothesis ((H_1) or (H_a)): The observed data do not fit the expected distribution.
  • Test Statistic: ( \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} )
  • (O_i) is the observed frequency in category (i), and (E_i) is the expected frequency in category (i).

Steps in Conducting a Chi-Squared Test:

  1. Formulate Hypotheses:
  • State the null hypothesis ((H_0)) and the alternative hypothesis ((H_1) or (H_a)).
  1. Choose Significance Level ((\alpha)):
  • Decide on the significance level, representing the probability of making a Type I error.
  1. Collect Data:
  • Gather data and organize it into a contingency table or a set of observed and expected frequencies.
  1. Calculate Test Statistic:
  • Use the appropriate formula to compute the chi-squared statistic.
  1. Determine Critical Region:
  • Identify the critical region based on the significance level and degrees of freedom.
  1. Make Decision:
  • If the calculated chi-squared statistic falls into the critical region, reject the null hypothesis; otherwise, fail to reject the null hypothesis.
  1. Draw Conclusion:
  • Summarize the results and draw conclusions based on the evidence against the null hypothesis.


Suppose you have data on the smoking habits of individuals (smoker or non-smoker) and their incidence of lung cancer (yes or no). You organize the data into a 2×2 contingency table and want to test whether there is a significant association between smoking habits and the incidence of lung cancer. The null hypothesis ((H_0)) could be that smoking habits and lung cancer are independent, while the alternative hypothesis ((H_1) or (H_a)) could be that there is a significant association.

You calculate the chi-squared statistic using the observed and expected frequencies and compare it to the critical region to make a decision about the null hypothesis.

Chi-squared tests provide valuable insights into the relationships between categorical variables and are commonly used in epidemiology, market research, genetics, and other fields where categorical data analysis is necessary.