Day 56 of 100 Days of AI

Going through support vector machines today, I realised I needed to refine my understanding of bias and variance. So I went to the machine learning GPT tutor to better understand the terms with regards to their specific use in machine learning. The tutor also provided some cool charts below. 

On to the concepts…

Bias is the degree to which a model’s predictions deviate from the true values because the model has made overly simplistic assumptions about the data.

For example, if we created a model that predicted a student’s grade based only on what they ate for breakfast, this would be a highly biased (and very wrong) model.

High bias leads to the underfitting of a model to the training data. Which is to say, the model fails to represent the underlying patterns of the data, as shown in the far left chart in the image below.

Variance refers to how sensitive a model is to changes in the training data. High variance models overfit the training data and capture even noise and spurious patterns that are only specific to one training data set. This means that high variance models don’t generalize to perform well on new data.

For example, suppose we trained a model to predict student grades and that model captured shoe size, rainfall, classroom temperature, lunch timing, left or right handiness, favourite video game, and so forth. In that case, this model might fit the training data well. However, it would fail to perform against new data.

Notice that high bias (a simplistic model) comes with low variance (i.e. not sensitive to small changes in the training data). While low bias (a complicated model with lots of parameters) comes with high variance (i.e. more sensitivity to changes in the training.)

In machine learning, it’s crucial to steer clear of high bias (underfitting) and high variance (overfitting). These extremes can significantly hamper model performance, making the optimal scenario—a balance between the two—the ultimate goal.