Recommender Systems

Recommender Systems

  • Information filtering system that predicts and suggests items that a user might be interested in based on:
    • their preferences
    • past behavior
    • similarities to other users

Recommender Systems – history

Elaine Rich (1979) – Grundy

  • Users answered specific questions about their book preferences
  • System classified users into classes of preferences, or “stereotypes”,
  • System provided recommendations for books users might like based on their stereotype membership

Recommender Systems – history

Netflix Prize (2006-2009)

  • Training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies
  • user (integer id), movie (integer id), date of grade, grade (integer 1 to 5)

Recommender Systems

  • Collaborative filtering: makes recommendations based on the user’s interests and information collected from many other users (collaborating)
    • User-based: “Users similar to you enjoyed these items” (e.g. demographic-based)
    • Item-based: “Users who liked this item also liked these others”

What type of data would we need for this?

What algorithms could we use?

Recommender Systems

  • Content-based filtering: Uses implicit knowledge about user preferences, recommends items with similar attributes to what the user has previously liked
  • Knowledge-based systems: Uses explicit knowledge about user preferences and item properties
  • Hybrid approaches: Combines multiple recommendation techniques

Data

Explicit data collection, where users:

  • Rate items on a sliding scale
  • Rank a collection of items from favorite to least favorite
  • Choose which of two items is better
  • Create a list of items they like

Data

Implicit data collection: history of items viewed/purchased/clicked on by users (if social media, history of friends/connections)

Possible Algorithms

  • Unsupervised learning: cluster users or items
  • Regression: build a model to predict ratings
  • Reinforcement learning: get input from user on whether recommendation is good, adjust model (good for cold starts)
  • Recurrent Neural Networks (RNNs): process sequential user history
  • Convolutional Neural Networks (CNNs): extract features from images

Steps

  • Data Collection: gather user data, item data, and/or interaction data
  • Preprocessing: wrangle the data for analysis
  • Model Training: train a machine learning model on the data
  • Evaluation: evaluate the performance of the model
  • Prediction: use the model to make recommendations

Case Study

Download the Preprocessed_data.csv file from this collection of book ratings and set up your working environment.

  • What algorithms can we run on this data set?
  • What data wrangling do we need to do?

More resources

TensorFlow Recommenders