Day 11 of 100 Days of AI
Logistic regression continued.
In the lab portion of the intro to ML course today, I went through an exercise of running a logistic regression analysis on fictional customer data. I’ve put the code on Github here.
The model is structured as follows:
logit(p) = -0.2675 + (-0.1526 * tenure) + (-0.0791 * age) + (-0.0721 * address) + (-0.0196 * income) + (0.0519 * ed) + (-0.0950 * employ) + (0.1601 * equip)
A visual representation of the impact of the coefficients on churn is summarized in this chart.
The basic steps to produce the model were as follows:
- Load the dataset from a CSV file.
- Select the features we want to use for predictions. These were: tenure, age, address, income, education, employment status, equipment, and churn status.
- Preprocesses the data. We did just two bits of preprocessing here: (a) make sure the churn column has just integers and (b) normalize the feature set.
- Split the dataset into training and testing sets.
- Train a logistic regression model using the training data.
- Make predictions on the test data.
- Evaluate the performance of the model using a confusion matrix, classification report, and log loss.
- I also added a bar graph that charts the coefficients so we can see which features have the greatest impact on churn.
I still find it incredible that if you can write some code, you can build a simple machine learning model with a few lines of code per the example below.