Recommender Systems

Recommender Systems

Information filtering system that predicts and suggests items that a user might be interested in based on:
- their preferences
- past behavior
- similarities to other users

Recommender Systems – history

Elaine Rich (1979) – Grundy

Users answered specific questions about their book preferences
System classified users into classes of preferences, or “stereotypes”,
System provided recommendations for books users might like based on their stereotype membership

Recommender Systems – history

Netflix Prize (2006-2009)

Training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies
user (integer id), movie (integer id), date of grade, grade (integer 1 to 5)

Recommender Systems

Collaborative filtering: makes recommendations based on the user’s interests and information collected from many other users (collaborating)
- User-based: “Users similar to you enjoyed these items” (e.g. demographic-based)
- Item-based: “Users who liked this item also liked these others”

What type of data would we need for this?

What algorithms could we use?

Recommender Systems

Content-based filtering: Uses implicit knowledge about user preferences, recommends items with similar attributes to what the user has previously liked
Knowledge-based systems: Uses explicit knowledge about user preferences and item properties
Hybrid approaches: Combines multiple recommendation techniques

Data

Explicit data collection, where users:

Rate items on a sliding scale
Rank a collection of items from favorite to least favorite
Choose which of two items is better
Create a list of items they like

Data

Implicit data collection: history of items viewed/purchased/clicked on by users (if social media, history of friends/connections)

Possible Algorithms

Unsupervised learning: cluster users or items
Regression: build a model to predict ratings
Reinforcement learning: get input from user on whether recommendation is good, adjust model (good for cold starts)
Recurrent Neural Networks (RNNs): process sequential user history
Convolutional Neural Networks (CNNs): extract features from images

Steps

Data Collection: gather user data, item data, and/or interaction data
Preprocessing: wrangle the data for analysis
Model Training: train a machine learning model on the data
Evaluation: evaluate the performance of the model
Prediction: use the model to make recommendations

Case Study

Download the Preprocessed_data.csv file from this collection of book ratings and set up your working environment.

What algorithms can we run on this data set?
What data wrangling do we need to do?

More resources

TensorFlow Recommenders