Chapter 10: Advanced Risk Minimization

This chapter treats the theory of risk minimization in more depth.

Chapter 10.1: Risk Minimizers

We introduce important theoretical in risk minimization: Risk minimizer, Bayes risk, Bayes regret, consistent learners and the optimal constant model.

Chapter 10.2: Pseudo-Residuals

We introduce the concept of pseudo-residuals and discuss its relation to gradient descent.

Chapter 10.3: L2-loss

In this section, we revisit the L2 loss and derive risk minimizer and optimal constant model.

Chapter 10.4: L1-loss

In this section, we revisit the L1 loss and derive risk minimizer and optimal constant model.

Chapter 10.5: Advanced Regression Losses

In this section, we introduce and discuss the following advanced regression losses: Huber, log-cosh, Cauchy, log-barrier, epsilon-insensitive, and quantile loss.

Chapter 10.6: 0-1-loss

In this section, we revisit the 0-1-loss and derive risk minimizer and optimal constant model.

Chapter 10.7: Bernoulli Loss

In this section, we introduce the Bernoulli loss and derive risk minimizer and optimal constant model. We further discuss the connection between Bernoulli loss minimization and tree splitting according to the entropy criterion.

Chapter 10.8: Brier Score

In this section, we introduce the Brier score and derive risk minimizer and optimal constant model. We further discuss the connection between Brier score minimization and tree splitting according to the Gini index.

Chapter 10.9: Advanced Classification Losses

In this section, we introduce and discuss the following advanced classification losses: (Squared) hinge loss, L2 loss on scores, exponential loss, and AUC loss.

Chapter 10.10: Maximum Likelihood Estimization vs. Empirical Risk Minimization I

In this section, we discuss the connection between maximum likelihood estimation and risk minimization. We discuss the correspondence between a Gaussian error distribution and L2 loss.

Chapter 10.11: Maximum Likelihood Estimization vs. Empirical Risk Minimization II

In this section, we discuss the connection between maximum likelihood estimation and risk minimization for further losses (L1 loss, Bernoulli loss).