If the n pairs of values of random variables X and Y in a random sample are denoted by (x1, y1), (x2, y1),…, (xn, yn), the sample correlation coefficient r is given bywhereand Syy is defined analogously to Sxx. If the sample means are denoted by and , alternative definitions areThe coefficient r can take any value from −1 to 1, inclusive. When increasing values of one variable are accompanied by generally increasing values of the other variable then r > 0 and the variables are said to display positive correlation. If r < 0 then the variables display negative correlation.
The idea of correlation was put forward by Galton in 1869, and it was Galton who was the first to denote it by the symbol r in 1888. The formulae given here were introduced by Karl Pearson in 1896.
The sample correlation coefficient r is an estimate of the population correlation coefficient ρ.
Correlation is closely linked to linear regression. If the least squares regression lines of y on x and of x on y for the sample (x1, y1), (x2, y2),…, (xn, yn) are, respectively, y=a+bx and x=c+dy then r2=bd.
In a hypothesis test, to test for significant evidence of a linear relationship between X and Y, we compare the null hypothesis that ρ=0 with the alternative hypothesis that ρ≠0, rejecting the null hypothesis if |r | is too large. See also coefficient of determination; rank correlation coefficient.
http://www.stat.tamu.edu/~west/ph/coreye.html Applet.