Bias–variance tradeoff

The bias–variance tradeoff is a big idea in statistics and data science. It affects how well a model can fit data.

To understand the tradeoff, think of a model as like a set of instructions for how to get from one place to another. The "bias" of a model is how far away the instructions are from the correct path. The "variance" of a model is how much it changes depending on what data it is given.

When a model is too simple, it will be biased because it won't include enough information to get from one place to another. In other words, it won't give the correct instructions. But it will have low variance because the instructions don't change depending on the data you give it. This is called underfitting the data.

On the other hand, when a model is too complex, it will be less biased because it can include more information to get from one place to another. But it will have high variance because it is easily "confused" by new data. This is called overfitting the data.

The goal is to find a balance between low bias and low variance. If a model is too simple, it won't be able to capture the patterns in the data. If a model is too complex, it won't be able to generalize well to new data. The bias-variance tradeoff is about finding the happy medium between the two.

Related topics others have asked about:

Accuracy and precision, Bias of an estimator, Cramér–Rao bound, Double descent, Gauss–Markov theorem, Hyperparameter optimization, Law of total variance, Minimum-variance unbiased estimator, Model selection, Prediction interval, Regression model validation, Supervised learning