Advanced Methods in Data Science

Machine Learning

What is it? (discuss)

What is Machine Learning?

  • subfield of computer science + statistics
  • building algorithms/models based on data (i.e. collection of examples of some phenomenon)

Types of Machine Learning

  • supervised (labeled data)
  • semi-supervised (both labeled and unlabeled data)
  • unsupervised (unlabeled data)
  • reinforcement (reward system, with the goal of learning a policy)

For all of these, data is required – one important step is feature engineering (how to transform data into feature – more on this later)

Depending on the target type, classification or regression

Supervised Learning

  • Labeled data – input (features), output (target, response)

Can you think of examples of supervised learning?

Supervised Learning

  • Example: spam vs. not spam

What features can we use?

Supervised Learning Algorithms

  • Support Vector Machines (SVM)
  • Decision Tree/Random Forest
  • Logistic Regression
  • Linear Regression
  • Naive Bayes
  • K-Nearest Neighbors

How to decide?

Unsupervised Learning

  • Clustering (K-Means)
  • Principal Component Analysis (Dimensionality Reduction)

Reinforcement Learning

  • Recommendation Systems (users give feed on whether a recommendation was good or not)
  • Automated Robots
  • Autonomous Driving
  • Natural Language Processing – text prediction, translation