Case Study – Logistic Regression

Data

Download and extract these data on animals at an animal shelter in California.

Set up your coding environment.

library(tidyverse)
library(effects)

animal_shelter <- read_csv("data/animal_shelter.csv")

What factors affect an animal being adopted?

Response Variable

We have to recode our response variable to 1 and 0:

animal_shelter <- animal_shelter |>
  mutate(response = if_else(adopted == "yes", 1, 0))

Descriptive Statistics

animal_shelter |>
  group_by(animal_type) |>
  summarize(percent_adopted = mean(response)) |>
  arrange(-percent_adopted)

Logistic Regression

model <- glm(response ~ animal_type,
             family = binomial,
             data = animal_shelter)
summary(model)

Call:
glm(formula = response ~ animal_type, family = binomial, data = animal_shelter)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -0.80673    0.04769 -16.917  < 2e-16 ***
animal_typecat     1.15353    0.05067  22.766  < 2e-16 ***
animal_typedog     1.77260    0.05286  33.531  < 2e-16 ***
animal_typeother   0.33478    0.06758   4.954 7.28e-07 ***
animal_typerabbit  1.95438    0.11282  17.322  < 2e-16 ***
animal_typewild   -0.80701    0.08627  -9.354  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 40092  on 29497  degrees of freedom
Residual deviance: 37232  on 29492  degrees of freedom
AIC: 37244

Number of Fisher Scoring iterations: 4

Logistic Regression

We have to calculate the \(R^2\) manually:

1 - model$deviance/model$null.deviance
[1] 0.07132207

We can also get the effects:

effect("animal_type", model) |>
  data.frame() |>
  arrange(-fit)