Day 3 - Supervised learning - Distance based methods

Posted on May 31, 2017 by Govind Gopakumar

Lecture slides in pdf form

Announcements

Recap

Mathematics

MLE modelling

How to assume a model, work out the loss/reward function, optimize it, and arrive at a final model.

Proximity Based Methods

What is the end goal?

Supervised learning

What’s the easiest way to assign a label/value?

Naive method of doing classification?

Formal “names”

Our first classifier

Distance based classifier

Given Input

What is the objective now?

Distance from means - I

Overview of model

Coming up with our “decision function”

Distance from means - II

As similarity to training data

What does this mean?

Distance from means - III

Geometry of the decision function

Drawbacks and strengths?

Distance from means - IV

Extending this

K nearest neighbors

KNN - I

Overview of model

Geometry of the decision function

KNN - II

Drawbacks and strengths?

Things to consider for this model

KNN - III

What is the optimal K?

Extensions to KNN

Partition based methods

Why do we require better methods?

Geometry of the problem

Model implementation

Solution? (Partitioning?)

Asking questions from data

Let’s classify oranges!

Natural human thought?

Decision Trees - I

Model overview

Geometry of the problem

Decision Trees - II

How do we ask the right questions?

How useful is a feature for us?

Decision Trees - III

Entropy to measure utility

How does this help us?

Decision Trees - IV

Playing Tennis

Decision Trees - V

Computing IG for features

Values

Decision Trees - VI

IG for all features

Choosing features?

Decision Trees - VII

For real valued features?

Extending this

Conclusion

Concluding Remarks

Takeaways

Announcements

References