Day 52 of 100 Days of AI

Today, I returned to decision trees. Even though I was building decision tree models by day 7 of this challenge, I could do with going back to the theory and understanding it better. And that’s what I started doing today.

Before starting, we should know that there are two types of decision trees in machine learning.

  1. Classification trees — which classify an item into a particular category, based on a series of yes/no responses to a series of questions (the nodes in the tree). For example:
    • Will a customer churn based on certain events?
  2. Regression trees — which predict a continuous value, based on a series of yes/no responses to a series of questions (the nodes in the tree). For example:
    • What is the price of a house given its location, number of rooms, and size?

Here’s an example decision tree from Wikipedia.

Decision trees have the benefit of simplicity and interpretability. It’s easier to follow a path on a decision tree versus scrutinising millions of neurons in an artificial neural network!

That said, decision trees have their limits. They are prone to overfitting and don’t generalize well to new data.

There’s more about the pros and cons of decision trees on the Scikit Learn website here.