Module 15 Data Case Study 4
The learning objectives of this module are:
Apply different parts of the data science workflow, for data import to data visualization, to a new data set
Build an R script with data wrangling and a R markdown file with the data analysis report
15.1 Data
For this module, we are using 2020 election data from Kaggle. More specifically, we want the president_county_candidate.csv
data file.
15.2 Data Wrangling
We need to apply the following steps:
- Read data in R
- Explore data
- Add (with mutate) any other variables we find relevant
## Rows: 31,139
## Columns: 6
## $ state <chr> "Delaware", "Delaware", "Delaware", "Delaware", "Delaware"…
## $ county <chr> "Kent County", "Kent County", "Kent County", "Kent County"…
## $ candidate <chr> "Joe Biden", "Donald Trump", "Jo Jorgensen", "Howie Hawkin…
## $ party <chr> "DEM", "REP", "LIB", "GRN", "DEM", "REP", "LIB", "GRN", "R…
## $ total_votes <dbl> 44552, 41009, 1044, 420, 195034, 88364, 2953, 1282, 71230,…
## $ won <lgl> TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE…
How many votes total for each candidate?
We can get electoral votes per state. The best table I found is from wikipedia
## Rows: 51
## Columns: 2
## $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Californi…
## $ electoral_votes <dbl> 9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, …
How many electoral votes for each candidate? First we need to calculate who won each state.
## Rows: 51
## Columns: 2
## $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", …
## $ total_votes <dbl> 2309900, 334789, 3384972, 1211793, 16822143, 3254844, 1821…
## Rows: 51
## Columns: 5
## Groups: state [51]
## $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California"…
## $ candidate <chr> "Donald Trump", "Donald Trump", "Joe Biden", "Donald Tru…
## $ popular_vote <dbl> 1434159, 179080, 1671491, 758183, 10734181, 1803419, 107…
## $ total_votes <dbl> 2309900, 334789, 3384972, 1211793, 16822143, 3254844, 18…
## $ perc_pop_vote <dbl> 0.6208749, 0.5349041, 0.4937976, 0.6256704, 0.6380983, 0…
Now we can add electoral votes to winner by state data.
## Rows: 51
## Columns: 6
## Groups: state [51]
## $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Californi…
## $ candidate <chr> "Donald Trump", "Donald Trump", "Joe Biden", "Donald T…
## $ popular_vote <dbl> 1434159, 179080, 1671491, 758183, 10734181, 1803419, 1…
## $ total_votes <dbl> 2309900, 334789, 3384972, 1211793, 16822143, 3254844, …
## $ perc_pop_vote <dbl> 0.6208749, 0.5349041, 0.4937976, 0.6256704, 0.6380983,…
## $ electoral_votes <dbl> 9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, …
Plot it!
15.3 DATA CHALLENGE 09
Accept data challenge 09 assignment