Case Studies

Case Study 1

Data

We will be working with Road Accident Survival Dataset

Data Cleaning

Make sure the data are in a format that can be modeled.

Modeling

  • What is the target?
  • What are the features?
  • Inferential or Machine learning?

Hyperparameters

  • algorithms have a number of hyperparameters
  • hyperparameters include solvers, regularization parameters, learning rate etc.

Hyperparameter Tuning

  • systematically identify the optimal hyperparameter settings

GridSearchCV documentation

from sklearn.model_selection import GridSearchCV

Hyperparameter Tuning with GridSearchCV

First we create a dictionary with paramenter options

param_grid = {"solver": ["lbfgs", "liblinear", "newton-cg",
                             "newton-cholesky", "sag", "saga"],
              "penalty": ["l1", "l2"]}

After creating the model, we create are grid search object and fit the training data to it.

grid_search = GridSearchCV(model, param_grid, cv=10)
grid_search.fit(X_train_scaled, y_train)

Hyperparameter Tuning with GridSearchCV

We can access the best model, and evalauate it

best_model = grid_search.best_estimator_
best_params = grid_search.best_params_

accuracy = best_model.score(X_test, y_test)

print(best_params)
print(accuracy)

Case Study 2

Data

Fatality in car crashes

  • No need to clean the data, it’s ready to go
  • Submit your predictions on the test.csv data as a my_predictions.csv file to gradescope – make sure you have a column named prediction in your file