Day 11 of 100 Days of AI

Logistic regression continued.

In the lab portion of the intro to ML course today, I went through an exercise of running a logistic regression analysis on fictional customer data. I’ve put the code on Github here.

The model is structured as follows:

logit(p) = -0.2675 + (-0.1526 * tenure) + (-0.0791 * age) + (-0.0721 * address) + (-0.0196 * income) + (0.0519 * ed) + (-0.0950 * employ) + (0.1601 * equip)

A visual representation of the impact of the coefficients on churn is summarized in this chart.

And here’s the performance of the model, illustrated with a confusion matrix.

The basic steps to produce the model were as follows:

  1. Load the dataset from a CSV file.
  2. Select the features we want to use for predictions. These were: tenure, age, address, income, education, employment status, equipment, and churn status.
  3. Preprocesses the data. We did just two bits of preprocessing here: (a) make sure the churn column has just integers and (b) normalize the feature set.
  4. Split the dataset into training and testing sets.
  5. Train a logistic regression model using the training data.
  6. Make predictions on the test data.
  7. Evaluate the performance of the model using a confusion matrix, classification report, and log loss.
  8. I also added a bar graph that charts the coefficients so we can see which features have the greatest impact on churn.

I still find it incredible that if you can write some code, you can build a simple machine learning model with a few lines of code per the example below.