Day 43 of 100 Days of AI

Logistic regression thresholds.

A few weeks back I built a number of logistic regression models without quite appreciating the impact of the thresholds you set. A few pages into a chapter on assessing ML model performance helped me eliminate this gap today.

It turns out that if you lower the classification threshold, you increase the chances of correctly identifying positive cases, the True Positives. This comes at the cost of increased False Positives. However, there are some cases where this is the best approach.

This book shares the example of ebola cases. In that situation you would rather have a lower classification threshold and increase your True Positive Rate (i.e. recall) at the expense of more False Positives. This ensures you catch the maximum number of ebola cases. A less morbid example is venture capital. If you had a startup success prediction model for early-stage companies you would be better off with a lower threshold since that limits the chances of missing out on a potential outlier success.

I’ll continue with this thought and more reading tomorrow.