A measure of the variability of a set of data. For data x1, x2,…, xn, with sample mean given by
the sample variance is defined to be
The variance is never negative and can be zero only if all the data values are the same.
In the case where the frequency of the observation xj is fj, for j=1, 2,…, m, the variance can be calculated using
where n, the total sample size, is given by
In these expressions for variance the divisor n is used. This is correct if the data set effectively constitutes the entire population; for example, if the values x1, x2,… are the diameters of the planets of the solar system, or the lifetimes of all known patients with a rare disease. However, if the data constitute a random sample from a population, and we are interested in the variance of the values in the population, as opposed to the variance of the values in the sample, then it is appropriate to use the divisor (n−1), since this leads to an unbiased estimate of the population variance. This sample variance is given by
as appropriate. The factor linking the population variance formulae and the sample variance formulae is known as the Bessel correction.