Decision Tree : Induction

 Decision Tree - Classification



Induction Algorithm

  • Tree is constructed in a greedy, top-down,recursive, divide-and-conquer manner
  • Build one node at a time, starting with the root
  • At the start: use all data samples to determine best attribute for the root
  • Based on that attribute, select data subset that goes to the children
  • For each child, repeat the process
  • Test attributes are selected based on a statistical measure (e.g., information gain)

Stopping Criteria- How do we know when to stop?

  • ­All samples for a given node belong to the same class
  • ­There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
  • ­There are no samples left

Hunt's Algorithm

Let Dnode be the set of training records that reach a node

General Procedure:

  • ­ If Dnode contains records that all belong the same class ynode, then node is a leaf node
  • labeled as ynode
  • ­ If Dnode is an empty set, then node is a leaf node labeled by the default class, ydefault
  • ­ If Dnode contains records that belong to more than one class, use an attribute test to split
  • the data into smaller subsets.
  • ­ Recursively apply the procedure to each subset.

Step-by-Step Procedure

Imagine you’re at a node in the tree with a group of training examples, which we’ll call Dnode.

  1. All records belong to one class → Stop

    • If all the examples in Dnode are from the same class, then this node becomes a leaf node.

    • The leaf is labeled with that class.

    • Example: If all customers in this group “buy” a product, the leaf is labeled “Buy.”

  2. No records left → Stop

    • If Dnode is empty (no examples reach this node), the algorithm creates a leaf node with a default class.

    • The default class is usually the majority class of the parent node.

  3. Mixed records → Split further

    • If Dnode contains a mix of different classes, the algorithm chooses an attribute test (a question like “Is Age > 30?” or “Is Weather = Sunny?”).

    • This test splits the data into smaller subsets.

    • Each subset becomes a new child node.

  4. Repeat the process

    • The algorithm is applied recursively to each child node, checking again: are all records the same class? Empty? Or mixed?

    • This continues until the stopping condition is reached (all leaves are pure or no data is left).

Geometric Interpretation


Next up - splitting criteria


Comments