How not to lie with statistics

Small Sample Size or Sample Bias

  • Beware of inaccurate generalizations
  • Small sample size might not represent the entire population
  • Sampling choice might favor certain groups

Data Context

  • Timeframe of the data – “trends”, “recently” – when? what period of time?
  • Who collected the data? What was the purpose of data collection?

Data Quality

  • Data collection bias – Which one of these do you prefer? vs. Do you think X is better?
  • Data processing bias – selecting data that supports hypothesis
  • Wrangling bias or errors – how were variables coded?

Practice

Download the Fashion Dataset, inspect it, and answer the following questions:

  • Who collected the data?
  • How was the data collected?
  • When was the data collected?
  • Who is represented in the data? (what is the population this was sampled from?)
  • Are there any outliers? What do they mean?

Practice

Who is represented in the data?

Age Group n
18–24 132
25–34 5
35–44 1
45+ 1
Under 18 11

Practice

Who is represented in the data?

Gender n
Female 67
Male 83

Practice

Who is represented in the data?

Profession n
Freelancer 2
Office Worker 4
Other 4
Student 140

Practice

Who is represented in the data?

Age Group Profession Female Male
18–24 Student 61 67
25–34 Office Worker 1 1
25–34 Other 2 0
35–44 Other 1 0
Under 18 Student 2 9
18–24 Freelancer 0 1
18–24 Office Worker 0 2
18–24 Other 0 1
25–34 Student 0 1
45+ Freelancer 0 1

Descriptive Stats

  • Measures of centrality (mean, median, mode) might differ, showing different aspects of the data
  • Measures of variability (standard deviation) can give us more information that measures of centrality alone cannot
  • Question to ask: Why were the measures displayed chosen?

Descriptive Stats – practice

What are the problems with the table below? How would you fix it?

Age Group Clothing style reflect your personality? (1-10)
18–24 7
25–34 8
35–44 4
45+ 0
Under 18 8

Inferential Stats

  • What tests and models were run?
  • How many tests and models were run?

The more statistical tests are run, the greater the probability of finding false positives (Type I errors) just by chance.

Inferential Stats

Remember that for the standard p-value threshold (alpha) of 0.05, we are accepting a 5% chance of finding a significant result when there isn’t one. If we run 20 independent tests, our probability of getting no false positives for all 20 tests is 36%.

  • P(getting a false positive for one test) = 0.05
  • P(not getting a false positive for one test) = 1 - 0.05 = 0.95
  • P(no false positives for all 20 tests) = \(0.95^{20}\) = 0.36

Inferential Stats – issues

  • P-hacking: Running many tests until finding a statistically significant result
  • Cherry-Picking: Selecting and reporting only whatever supports their hypothesis
  • HARKing: Hypothesizing After Results are Known
  • Data dredging: Exploring data without pre-specified hypotheses
  • Fishing Expeditions: Examining associations between different combinations of variables with the hope of finding something that is statistically significant

Inferential Stats – fixes

  • Bonferroni correction – calculate new α (alpha) by dividing original α by the number of tests (n) ran (more conservative)
  • Benjamini-Hochberg – order all p-values from smallest to largest, calculate critical value of each p-value, find the largest p-value that is less than its critical value (less conservative)

Visualizations

  • Different scales for comparison across different plots
  • Different baselines
  • Stretching or shrinking the scale to minimize or highlight changes
  • Percentages do not indicate actual counts or raw numbers
  • Percentages of percentages (nested percentages)

Visualizations – find the issues

Visualizations – find the issues

Results

  • Effect size vs. significance
  • Correlation is not Causation

Results – find the issues

Multinomial logistic regression was run with age as the response variable and footwear as a predicting variable. Here are the conditional probabilities:

footwear 18 24 25 34 35 44 Under 18
Boots 0.92 0.08 0.00 0.00
Heels/Loafers 0.83 0.17 0.00 0.00
Other 0.88 0.00 0.00 0.12
Sandals/Flats 0.80 0.07 0.07 0.07
Sneakers 0.90 0.02 0.00 0.08

Results – find the issues

Practice

What are the issues with these analysis results from the 2020 US presidential elections?

Practice

What are the issues with these analysis results?

Spotify Visualization

The Meta dynasty

An Analysis of Deaths in U.S. National Parks

Linear Scale – concentration

pH scale is logarithmic: an increase or decrease of an integer value changes the concentration by a tenfold

Log Scale – pH values

pH scale is logarithmic: an increase or decrease of an integer value changes the concentration by a tenfold

Log scales

The Public May Not Understand Logarithmic Graphs Used to Depict COVID-19

Final advice

  • Be skeptical about other people’s results
  • Be honest about your results