Problems concerned with the determination of an optimal strategy. The bandit referred to is the ‘one-armed bandit’ otherwise known as a ‘fruit machine’. For an actual machine in an amusement arcade the general advice would be not to play it, since it is the machine owner who will benefit in the long run. However, the term ‘one-armed bandit’ in statistics refers to the problem of deciding whether to ‘play’ when the expected pay-off may not be negative. Statisticians also consider k-armed bandits for which the question is ‘Which of the k arms should be played?’ One application of the resulting theory is to the medical problem of deciding which of a number of possible treatments should be given to a patient—here the pay-off is measured in terms of the patient’s future health.
An optimal strategy is based on the Gittins index, which is defined as the maximum value, over all N, of the quantitywhere E{X(t)} is the expected value of the payout at the tth play of the bandit and β is the discount rate.