Cross-validation (statistics)

Cross-validation is a way of checking if the stuff we think is true really is true. Imagine you wanted to play a game where you guessed which hand a toy was in. You could play once and see if you were right, but you might get lucky or unlucky. So instead, you could play the game many times with different toys, and each time you guessed which hand and saw if you were right. That way, you could be more sure that you were really good at the game and not just lucky.

Well, cross-validation is kind of like that game. Only instead of guessing which hand a toy is in, it's about checking if we understand how things work. For example, scientists might want to know if a new medicine is effective. They could give the medicine to one group of people and something else (like a sugar pill) to another group, and see if the people who got the medicine got better faster. But just like with the toy game, that might not be enough. Maybe the people who got the medicine were just naturally healthier or more positive, and that's why they got better. That's where cross-validation comes in.

With cross-validation, scientists give the medicine to some people and the placebo to others, just like before. But this time, they also split each group into two, and use one half of each group to check if the medicine works, and the other half to test it again. That way, if the medicine really is effective, the results should be similar in both tests. If they're not, then the scientists know they need to study things more.

So cross-validation is just like that toy game, or dividing people in half to see if medicine works, except it's used to check if we really know what we think we do. It's a smart way to make sure that we're right, and not just guessing.

Related topics others have asked about:

Boosting (machine learning), Bootstrap aggregating, Bootstrapping (statistics), Leakage (machine learning), Model selection, Out-of-bag error, Stability (learning theory), Validity (statistics)