Day 4 of 100 Days of AI
Today, I went through a classifier lab on the intro ML course. There are several bits I didn’t quite understand but GPT helped get me over the basics. For example, I will need to review my notes on the Jaccard Index and F1-score (evaluation metrics for classifier models), and the concept of normalisation, where you transform your data without changing its distribution. This makes it easier to calculate distances between points, a critical bit when trying to make classification predictions.
On the latter point, I’ve included some charting code in the github repo here (see image below), which helped me understand the normalisation concept. The charting code was written by GPT, with some minor tweaks from me.
Key takeaways:
- Classification is a supervised machine learning approach.
- It makes a prediction about what discrete class some item should fall into.
- Classifiers can be used for spam detection, document classification, speech recognition, or even to predict if a certain customer will churn, based on a variety of characteristics.
- Classification algorithms include k-nearest neighbour (which I’ve put on github here), decision trees, and logistic regression (which instead of putting an item into a class, gives you a probability that it will fit a particular bucket.)
- The K-nearest neighbours algorithm was fun to learn about, and the intuition for it is simpler than I expected. The basic notion is as follows: for a given item to predict on, look at a select number of neigbhours (the k-number), and predict the outcome based on the most popular category that those neighbours are in (or the neighbours’ mean or median of the values you’re trying to predict for e.g. house price based on location, square foot size etc.)
- Classification algorithms can be evaluated with a number of accuracy measures, such as the Jaccard Index, a F1-score, or Log Loss. I didn’t cover these in detail but I did enough to get the very basics.