Decision Trees

Definition

A decision tree is a flow-chart-like tree structure
Internal node denotes a test on an attribute (feature)
Branch represents an outcome of the test
All records in a branch have the same value for the tested attribute
Leaf node represents class label or class label distribution

Decision Tree

Advantages - Common type of classifier

Predictive accuracy: Captures complex patterns.
Speed: Faster than many others to build. Very quick to apply model.
Robustness: Can handle noise and missing values.
Scalability: Some implementations are scalable.
Interpretability: Very readable. To classify an instance, walk your way down the tree following the rules.

Decision Tree Classification task

Classifier

The diagram illustrates a machine learning process for classification. The training data is fed into a classification algorithm, such as Hunt's algorithm, to train a model. This training process essentially teaches the algorithm to recognize patterns and relationships within the data. The output of the training phase is a model (also called a classifier), which in this specific case is represented as a tree structure. Once the model is created, it can be used to apply its learned knowledge to new, unseen data, referred to as an instance. By processing this new instance, the model makes a prediction, which is its best guess about the class or category of the input.

Example

The provided image is an excellent example of a Decision Tree (DT) model, a fundamental concept in machine learning. On the left is the training data table, which contains several features (age, income, student, credit_rating) and a target variable (buys_computer). The goal is to predict whether a person will buy a computer based on their characteristics.

On the right, the decision tree visually represents the classification model learned from the data. The tree starts with a root node (age?) and uses a series of internal nodes (questions) and branches (answers) to classify an instance. By following a path down the tree based on the values of the features for a new data point, a prediction is made at a leaf node (the colored squares at the end of the branches).

For example, to predict if a person buys a computer, the tree first checks their age. If their age is <=30, it then checks if they are a student. If they are a student, the tree predicts "yes" (they will buy a computer). This process of recursively partitioning the data based on the most informative features is the core mechanism of decision tree algorithms, such as the ID3 method by Quinlan mentioned in the image.

Geometric interpretation

The diagram provides a geometric interpretation of how a decision tree classifies data by recursively partitioning the feature space. The scatter plot on the left shows data points represented in a 2D space defined by two features: income and age. Each point is a customer, with green circles and purple plus signs representing two different classes. The decision tree on the right corresponds to the splits in this geometric space. The first split, based on income, creates a vertical decision boundary at $50 K$ . This splits the data into two regions: those with income less than $50 K$ and those with income greater than or equal to $50 K$ . The tree's next split is based on age, creating a horizontal decision boundary at age 45, but only for the group with income greater than or equal to $50 K$ . This process continues, creating a series of axis-parallel decision boundaries that divide the feature space into rectangular regions. Each region corresponds to a leaf node in the decision tree and is assigned a class label, which is used to make a prediction for any new point (like the red dot with a '?') that falls within that region.

Next up -

Machine Learning

Search This Blog