An approach concerned with the consequences of modifying our previous beliefs as a result of receiving new data. By contrast with the ‘classical’ approach which begins with a hypothesis test that proposes a specific value for an unknown parameter, θ, Bayesian inference proposes a prior distribution (often simply called a prior), p(θ), for this parameter. Data x1, x2,…, xn are collected and the likelihood f(x1, x2,…, xn|θ) is calculated. Bayes’s theorem is now used to calculate the posterior distribution, g(θ| x1, x2,…, xn). The change from the prior to the posterior distribution reflects the information provided by the data about the parameter value. For any particular event the initial probability is described as the prior probability and the subsequent probability as the posterior probability.
If nothing is known about the value of a parameter, then a non-informative prior is used—typically, this is a uniform distribution over the feasible set of values of the parameter. Another approach, the empirical Bayes method, utilizes the data to inform the prior distribution.
In a similar way, if nothing is known about the underlying distribution, then the principle of indifference effectively states that all possible values should be assigned the same probability of occurrence. This is also called the principle of insufficient reason.
Subjective probability measures the degree of belief an individual has in an uncertain proposition. This could form the basis for a prior distribution. Another term is personal probability, though this may be used to suggest that the person’s selected probability is misguided.
Often, however, a more useful choice for the form of a prior distribution is a member of a family of distributions which is such that the posterior distribution is another member of that family, so that the effect of the data can be interpreted in terms of changes in parameter values. Such a prior is called a conjugate prior.
Sometimes useful information is available. For example, an appropriate prior for the amount taken by a supermarket on a Saturday might be a normal distribution centred on the amount taken the previous Saturday. This would be an informative prior.
Jeffreys argued that an appropriate prior should be unaffected by the way a model is expressed: this leads to the Jeffreys prior which is proportional to , where I(θ) is the Fisher information. Since it is only the relative sizes of the prior values that matter, those values need not sum or integrate to 1. Such a prior is called an improper prior.
In the Dempster–Shafer theory of evidence (suggested by Dempster in 1967 and later developed by his research student, Glenn Shafer) the Bayesian approach is developed to handle events with imprecisely known probabilities. The theory uses concepts termed ‘belief’ and ‘plausibility’ as lower and upper bounds on event probabilities.