Advanced Methods in Data Science

Machine Learning

What is it? (discuss)

What is Machine Learning?

subfield of computer science + statistics
building algorithms/models based on data (i.e. collection of examples of some phenomenon)

Types of Machine Learning

supervised (labeled data)
semi-supervised (both labeled and unlabeled data)
unsupervised (unlabeled data)
reinforcement (reward system, with the goal of learning a policy)

For all of these, data is required – one important step is feature engineering (how to transform data into feature – more on this later)

Depending on the target type, classification or regression

Supervised Learning

Labeled data – input (features), output (target, response)

Can you think of examples of supervised learning?

Supervised Learning

Example: spam vs. not spam

What features can we use?

Supervised Learning Algorithms

Support Vector Machines (SVM)
Decision Tree/Random Forest
Logistic Regression
Linear Regression
Naive Bayes
K-Nearest Neighbors

How to decide?

Unsupervised Learning

Clustering (K-Means)
Principal Component Analysis (Dimensionality Reduction)

Reinforcement Learning

Recommendation Systems (users give feed on whether a recommendation was good or not)
Automated Robots
Autonomous Driving
Natural Language Processing – text prediction, translation