Introduction to Machine Learning (I2ML)
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed holistically and as self-contained as possible, in order to cover most relevant areas of supervised ML. While the introductory parts are more aimed at a practical and operational understanding of the covered algorithms and models, we also include sound theoretical foundations and proofs in more advanced sections in order to teach ML theory as self-contained and precise as possible.
It can either be taken as an introductory undergraduate course early on - if you skip the more advanced sections - or as an introductory graduate-level course for Master's level students.
One general, important goal of the course - on top of clearly explaining the most popular ML algorithms - is to clearly demonstrate the fundamental building blocks behind ML, instead of introducing "yet another algorithm, with yet another differently named concept". We discuss, compare and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint and information theory and demonstrate that all of these are equally valid entry points to ML - which often (confusingly) talk about the same thing with different terminology. Being able to understand these similarities and enabling to mentally switch perspectives when needed is a major goal of this course.
If you want to learn more about this course, please (1) read the outline further below and (2) read the section on prerequisites
Later on, please note: (1) The course uses a unified mathematical notation. We provide cheat sheets to summarize the most important symbols and concepts. (2) Most sections already contain quizzes, coding demos, and exercises with worked-out solutions to enable self-study as much as possible.
What this course does not cover - in order to not have its scope grow completely out of hand: (1) Neural networks and deep learning. We are currently working on a similar repo / page for that, which builds upon this course. (2) An in-depth coverage of optimization. We might publish a course on that at some point, but this is currently lower priority.
While most of the course is on a conceptual, programming language-independent level - which is by design - we offer a large variety of applied exercises in R, often using the mlr3 package and its corresponding universe. We are working on offering the exercises in python as well.
Note: In summer semester 2021 we are still extending the material somewhat, so the complete first version including all advanced material will probably be available around 07/2021.
The course material is developed in a public github repository: https://github.com/compstat-lmu/lecture_i2ml. You can find the changelog at: https://github.com/compstat-lmu/lecture_i2ml/blob/master/CHANGELOG.md.
If you love teaching ML and have free resources available, please consider joining the team and email us now! (bernd.bischl@stat.uni-muenchen.de or ludwig.bothmann@stat.uni-muenchen.de)
Chapter 01: ML Basics
Chapter 02: Supervised Regression
Chapter 03: Supervised Classification
Chapter 04: Performance Evaluation
Chapter 05: Classification and Regression Trees (CART)
Chapter 06: Random Forests
Chapter 07: Tuning
This chapter introduces and formalizes the problem of hyperparameter tuning.
Chapter 08: Nested Resampling
Chapter 09: mlr3
Chapter 10: Advanced Risk Minimization
This chapter treats the theory of risk minimization in more depth.
Chapter 11: Multiclass classification
Chapter 12: Information Theory
Chapter 13: Curse of Dimensionality
Chapter 14: Hypothesis Spaces
Chapter 15: Regularization
Chapter 16: Linear Support Vector Machine
This chapter introduces the linear support vector machines as a model class.
Chapter 17: Nonlinear Support Vector Machine
This chapter introduces the nonlinear support vector machines.
Chapter 18: Gaussian Processes
This chapter introduces Gaussian processes as a model class.
Chapter 19: Boosting
This chapter introduces boosting as a sequential ensemble method.