Chapter 05: Classification and Regression Trees (CART)

This chapter introduces Classification And Regression Trees (CART), a well-established machine learning procedure. We explain the main idea and give details on splitting criteria, discuss computational aspects of growing a tree, and illustrate the idea of stopping criteria and pruning.

Chapter 5.1: Introduction

Decision trees are an important type of machine learning models and are of two main types: classification trees and regression trees. In this section, we explain the general idea of CART - Classification And Regression Trees.

Chapter 5.2: Splitting Criteria

CART algorithms require splitting criteria for trees, which are usually defined in terms of "impurity reduction". In this section we formalize the idea of splitting criteria and explain the details of splitting for both regression and classification.

Chapter 5.3: Growing a Tree

In this section, we explain how to grow a tree starting with an empty tree, a root node containing all the data. It will be shown that trees are grown by recursively applying greedy optimization to each node.

Chapter 5.4: Computational Aspects of Finding Splits

In this section, we explain the computational aspects of the node-splitting procedure, especially for nominal features. Additionally we illustrate how to deal with missing values.

Chapter 5.5: Stopping criteria & pruning

The recursive partitioning procedure used to grow a CART usually leads to problems such as exponential growth of computations, overfitting, and the horizon effect. To deal with these problems, we can use stopping criteria and pruning. In this section, we explain the basis of these two solutions.

Chapter 5.6: Discussion

In this section we discuss the advantages and disadvantages of CART and mention other tree methodologies.