Day 5 of 100 Days of AI
Today I learnt about cross validation (scikit-learn has Python helper functions for this.) This is where you split your training and testing data into different sets, and then you iteratively train and test against different combinations to assess how well a model performs on unseen data.
Two common methods of cross validation are k-fold validation and leave-one-out cross validation.
K-fold involves splitting the data into equal parts and rotating across them in terms of testing and training.
With leave-one-out cross validation, you select one data point for testing, and use the rest for training. You then move to another data point for testing, and then use the remainder for training, and so forth, until every data point has been used for testing.
I’ll return to these concepts when I write some more code next week.
Ps. One thing I’m realising as I dig into the basics of machine learning is that it’s a mix of art and science in terms of choosing techniques that may produce the best models. There’s a bunch of trial and error, even though it’s a deeply rigorous and mathematical field.