The test looks at:
The t-test is fairly robust to minor violations of normality, especially with larger sample sizes (typically n > 30)
The Welch t-test formula (also called Welch’s unequal variances t-test) is:
t = (x̄₁ - x̄₂) / √(s₁²/n₁ + s₂²/n₂)
Where:
We will be using data from the World Happiness Report for this case study.
Download the data and set it up in your coding environment.
If you have already set the data up, then you can reuse the project you had going.
\(H_0\) (Null Hypothesis): The difference in mean happiness score between Western Europe and South Asia is due to natural variability inherent in the population.
\(H_1\) (Alternative Hypothesis): The difference in mean happiness score between Western Europe and South Asia is real (not due to natural variability).
two_regions <- happiness |>
filter(regional_indicator %in% c("Western Europe", "South Asia"))
t.test(ladder_score ~ regional_indicator, data = two_regions)
Welch Two Sample t-test
data: ladder_score by regional_indicator
t = -13.546, df = 37.636, p-value = 4.762e-16
alternative hypothesis: true difference in means between group South Asia and group Western Europe is not equal to 0
95 percent confidence interval:
-3.037035 -2.247082
sample estimates:
mean in group South Asia mean in group Western Europe
4.247478 6.889537
Since the p-value is less than 0.05, we can reject the null hypothesis
\(H_0\) (Null Hypothesis): The difference in mean happiness score between the years of 2020 and 2024 is due to natural variability inherent in the population.
\(H_1\) (Alternative Hypothesis): The difference in mean happiness score between the years of 2020 and 2024 is real (not due to natural variability).
two_years <- happiness |>
filter(year %in% c(2020, 2024))
t.test(ladder_score ~ year, data = two_years)
Welch Two Sample t-test
data: ladder_score by year
t = -0.31048, df = 288.86, p-value = 0.7564
alternative hypothesis: true difference in means between group 2020 and group 2024 is not equal to 0
95 percent confidence interval:
-0.3023198 0.2199362
sample estimates:
mean in group 2020 mean in group 2024
5.486380 5.527572
Since the p-value is greater than 0.05, we are unable to reject the null hypothesis
“Absence of evidence is not evidence of absence” means that the fact that we haven’t found evidence for something doesn’t prove that it doesn’t exist.
Where:
ANalysis Of VAriance (ANOVA):
The core concept is comparing:
ANOVA is also fairly robust to minor violations of normality, especially with larger sample sizes (typically n > 30)
\(H_0\) (Null Hypothesis): The variables region and happiness score are independent. The difference in scores across different regions was due to natural variability inherent in the population.
\(H_1\) (Alternative Hypothesis): The variables region and happiness score are not independent. The difference in scores across different regions was not due to natural variability.
One-way ANOVA: Examines the effect of one independent variable on the response (dependent) variable