Day 44 of 100 Days of AI

Today, I went over ROC graphs. The acronym stands for receiver operating characteristic. This concept originates from signal detection theory during World War II. Radar operators wanted to ascertain how good they were at detecting enemy aircraft (“True Positives”) rather than confusing them for birds or some other noise (“False Positives”). So they designed a chart that illustrates the true positive rate and false positive rates under different thresholds (I blogged about thresholds yesterday.)

Here’s an example ROC curve from the Youtube channel StatQuest.

How do we interpret these curves? Each point represents the results of different thresholds in a logistic regression, for instance. In the above chart, if we wanted a threshold that had the lowest false positives, we would go for the point with 0 false positive rates and the highest true positive rate (i.e. the highest point on the y-axis where the x-value is 0).

Similarly, if we wanted the highest true positivity rate overall but we could tolerate some false positives, we would take the the point highlighted in red below.

Side note: I’m still working through this book and I’m eager to get back to code. However, it’s important to at least run through the basic fundamentals again before getting back to online courses and other fun applications of ML.