Chapter 04: Performance Evaluation

This chapter treats the challenge of evaluating the performance of a model. We will introduce different performance measures for regression and classification tasks, explain the problem of overfitting, the difference between training and test error and finally present a variety of resampling techniques.

Chapter 4.1: Introduction

It is a crucial part of Machine Learning to evaluate the performance of a model. We will explain the concept of generalization error and the difference between inner and outer loss.

Chapter 4.2: Measures Regression

In this section we make you familiar with essential performance measures for regression. In particular, mean squared error (MSE), mean absolute error (MAE), and a straightforward generalization of R2 are discussed.

Chapter 4.3: Measures Classification

In this section we make you familiar with essential performance measures for classification. A classifier predicts either class labels or class probabilities. Hence, its performance can be evaluated based on these two notions. We show you some performance measures for classification, including misclassification error rate (MCE), accuracy (ACC) and Brier score (BS). Additionally you will see the confusion matrix and learn about costs.

Chapter 4.4: Measures Classification ROC

From the confusion matrix, we can calculate a variety of "ROC" metrics. Among others, we will explain true positive rate, negative predictive value and the F1-Measure.

Chapter 4.5: Measures Classification ROC Visualisation

In this section, we explain the ROC curve and how to calculate it. Additionally, we will present AUC and partial AUC as global performance measures.

Chapter 4.6: Overfitting

When a machine learning model performs well on training data, but doesn't generalize on the test data, we speak of overfitting. We will show you examples of overfitting and how to diagnose overfitting.

Chapter 4.7: Training Error

There are two types of errors: training error and test error. The focus of this section is on the training error and difficulties related with it.

Chapter 4.8: Test Error

There are two types of errors: training error and test error. The focus of this section is on the test error and difficulties related with it.

Chapter 4.9: Resampling

Different resampling techniques help to assess the performance of a model. We will introduce cross-validation (with and without stratification), bootstrap and subsampling.