If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda. We will get back to you very soon.

- marking data to belong to one out of multiple known classes.
- like marking an email to be spam/not spam.
- given a set of tumour results on a patient, marking the tumour as malignant or cancerous.
- given the set of players, their records, team schedule, performance etc, finding out which team will win the league.

- similarity between two entities is defined by the distance between them.
- an entity is classified based on the K entities closest to it.
- knn works when a concept is difficult to define, but you know when you use it.
- knn does not work with noisy data, when class boundaries are not easy to define.
- mostly use euclidean distance, not manhattan distance.
- k nearest neighbors are lazy learners, because there is no learning.

- classfiying food as protein/vegetable/sugar etc based on sweetness/crunchiness.
- classifying the movie in a particular genre depending on the no of kicks, no of kisses etc.

- simple and easy to understand.
- makes no assumptions about the data.
- quick training(because there is hardly any training or learning)

- no model is produced, so no concepts/understanding happens as to how the features are related to the result.
- need to find out a K which works for our use case.
- slow classification(because there is no model/learning)
- works great on numeric data, nominal features need a lot of extra work as they convert to multiple features.

- Rememeber that
- Bias => ‘how much on an average are the predicted values different from the actual value.’
- Variance => ‘how different will the predictions of the model be if different samples are taken from the same population’

- large K => reduces variance by noisy data => introduces bias
- very large K => always predicts majority class => no variance, huge bias.
- very small K => small errors in training can lead to issues, high variance, less bias.
- one approach is to choose K as square root of N where N is the number of examples.
- other approach is to choose large K and give more weightage to the closer neighbors and less to far neighbors.
- another option is to choose best K by testing for your use case.
- but again it depends on the data, training data may suggest a value of K, but the best value of K may be different.

- the numeric features with higher values are more dominating(because they dominate in the distance calculation)
- so, we need to normalize the numeric features to make their importance comparable.
- to normalize, we subtract the min values and divide by range of X.
- Xnew = (X-Xmin)/(Xmax-Xmin)
- the resulting values are between 0 and 1.

- In KNN, there is no learning, no model, needs all the training data, so is very slow.
- This is because KNN always needs the previous data to find out the best match, and there is no formula that can be deduced from the previous data.