Introduction to Machine Learning (I2ML)

This website offers an open and free introductory course on (supervised) machine learning. The course is constructed holistically and as self-contained as possible, in order to cover most relevant areas of supervised ML. While the introductory parts are more aimed at a practical and operational understanding of the covered algorithms and models, we also include sound theoretical foundations and proofs in more advanced sections in order to teach ML theory as self-contained and precise as possible.

It can either be taken as an introductory undergraduate course early on - if you skip the more advanced sections - or as an introductory graduate-level course for Master's level students.

One general, important goal of the course - on top of clearly explaining the most popular ML algorithms - is to clearly demonstrate the fundamental building blocks behind ML, instead of introducing "yet another algorithm, with yet another differently named concept". We discuss, compare and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint and information theory and demonstrate that all of these are equally valid entry points to ML - which often (confusingly) talk about the same thing with different terminology. Being able to understand these similarities and enabling to mentally switch perspectives when needed is a major goal of this course.

If you want to learn more about this course, please (1) read the outline further below and (2) read the section on prerequisites

Later on, please note: (1) The course uses a unified mathematical notation. We provide cheat sheets to summarize the most important symbols and concepts. (2) Most sections already contain quizzes, coding demos, and exercises with worked-out solutions to enable self-study as much as possible.

What this course does not cover - in order to not have its scope grow completely out of hand: (1) Neural networks and deep learning. We are currently working on a similar repo / page for that, which builds upon this course. (2) An in-depth coverage of optimization. We might publish a course on that at some point, but this is currently lower priority.

While most of the course is on a conceptual, programming language-independent level - which is by design - we offer a large variety of applied exercises in R, often using the mlr3 package and its corresponding universe. We are working on offering the exercises in python as well.

Note: In summer semester 2021 we are still extending the material somewhat, so the complete first version including all advanced material will probably be available around 07/2021.

The course material is developed in a public github repository: You can find the changelog at:

If you love teaching ML and have free resources available, please consider joining the team and email us now! ( or

Chapter 01: ML Basics

This chapter introduces the basic concepts of Machine Learning. We focus on supervised learning, explain the difference between regression and classification, show how to evaluate and compare Machine Learning models and formalize the concept of learning.

Chapter 02: Supervised Regression

This chapter treats the supervised regression task in more detail. We will see different loss functions for regression, how a linear regression model can be used from a Machine Learning perspective, how to extend it with polynomials for greater flexibility and finally a fundamentally different approach - k-NN regression.

Chapter 03: Supervised Classification

This chapter treats the supervised classification task in more detail. We will see examples of binary and multiclass classification and the difference of the discriminative and the generative approach. Especially, we will treat logistic regression, linear and quadratic discriminant analysis, naive bayes and k-NN classification.

Chapter 04: Performance Evaluation

This chapter treats the challenge of evaluating the performance of a model. We will introduce different performance measures for regression and classification tasks, explain the problem of overfitting, the difference between training and test error and finally present a variety of resampling techniques.

Chapter 05: Classification and Regression Trees (CART)

This chapter introduces Classification And Regression Trees (CART), a well-established machine learning procedure. We explain the main idea and give details on splitting criteria, discuss computational aspects of growing a tree, and illustrate the idea of stopping criteria and pruning.

Chapter 06: Random Forests

This chapter introduces bagging as method to increase the performance of trees. A modification of bagging leads to random forests. We explain the main idea of random forests, benchmark their performance with the methods seen so far and show how to quantify the impact of a single feature on the performance of the random forest as well as how to compute proximities between observations based on random forests.

Chapter 07: Tuning

This chapter introduces and formalizes the problem of hyperparameter tuning.

Chapter 08: Nested Resampling

This chapter defines the untouched test principle. Additionally, the concepts of train-val-test split and nested resampling are explained.

Chapter 09: mlr3

This chapter introduces the R package mlr3. After an introduction of the basic concepts we focus on resampling, tuning and pipelines.

Chapter 10: Advanced Risk Minimization

This chapter treats the theory of risk minimization in more depth.

Chapter 11: Multiclass classification

This chapter treats multiclass classification.

Chapter 12: Information Theory

This chapter covers basic information-theoretic concepts and discusses their relation to machine learning.

Chapter 13: Curse of Dimensionality

This chapter introduces the phenomenon of the curse of dimensionality and discusses its effects on the behavior of machine learning models.

Chapter 14: Hypothesis Spaces

This chapter discusses hypothesis spaces in more depth.

Chapter 15: Regularization

This chapter introduces the concept of regularization and discusses common regularization techniques in more depth.

Chapter 16: Linear Support Vector Machine

This chapter introduces the linear support vector machines as a model class.

Chapter 17: Nonlinear Support Vector Machine

This chapter introduces the nonlinear support vector machines.

Chapter 18: Gaussian Processes

This chapter introduces Gaussian processes as a model class.

Chapter 19: Boosting

This chapter introduces boosting as a sequential ensemble method.

Cheat Sheets


Exercise Sheets



Team and License