Multi-armed bandit

A multi-armed bandit is like going to a candy store and there are lots of candy machines with different types of candies in them. Each machine has a lever that you can pull to get some candy. But the catch is you don't know which machine will give you the best candy, and you only have a limited amount of time and money to try them out.

So, you try a few machines and get some good candies, but there might be other machines that give even better candies. A multi-armed bandit is a way to help you figure out which machines are the best ones to pull the lever on so that you can get the best overall reward.

To do this, the multi-armed bandit algorithm tries different machines based on some rules or algorithms, and then it observes how much reward it gets from each machine. It uses this information to decide which machines to try more often in the future, and which machines to avoid.

So, if you think of each candy machine as an option or choice, and the candies you get as a reward or feedback, a multi-armed bandit is like a clever way to figure out the best choice among many options, by trying and learning from your experiences.

Related topics others have asked about:

Gittins index, Greedy algorithm, Optimal stopping, Search theory, Stochastic scheduling