Download data on movies and their rating, and set up your analysis environment.
Step 1: Histogram
What can we say about the distribution of the rating variable?
Boxplot
What questions can we answer?
What is the effect of type on rating?
Call:
lm(formula = averageRating ~ type, data = movies)
Residuals:
Min 1Q Median 3Q Max
-6.3704 -0.6462 0.1296 0.7687 3.8538
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.146239 0.002288 2685.794 < 2e-16 ***
typeshort 0.663780 0.003954 167.889 < 2e-16 ***
typetvEpisode 1.224139 0.002915 419.962 < 2e-16 ***
typetvMiniSeries 0.952108 0.012474 76.328 < 2e-16 ***
typetvMovie 0.417639 0.006182 67.558 < 2e-16 ***
typetvSeries 0.685069 0.005765 118.826 < 2e-16 ***
typetvShort 0.653040 0.026420 24.718 < 2e-16 ***
typetvSpecial 0.766563 0.013785 55.608 < 2e-16 ***
typevideo 0.361448 0.006195 58.348 < 2e-16 ***
typevideoGame 0.616535 0.105955 5.819 5.93e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 1081851 degrees of freedom
Multiple R-squared: 0.1455, Adjusted R-squared: 0.1455
F-statistic: 2.047e+04 on 9 and 1081851 DF, p-value: < 2.2e-16
What is the effect of type on rating?
Scatter plot
What is the effect of year on rating?
Call:
lm(formula = averageRating ~ year, data = movies)
Residuals:
Min 1Q Median 3Q Max
-6.0645 -0.7201 0.1972 0.9077 3.6386
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.121e+01 1.157e-01 -96.84 <2e-16 ***
year 9.024e-03 5.781e-05 156.08 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.326 on 1081859 degrees of freedom
Multiple R-squared: 0.02202, Adjusted R-squared: 0.02202
F-statistic: 2.436e+04 on 1 and 1081859 DF, p-value: < 2.2e-16
What is the effect of year on rating?