A procedure for identifying an appropriate linear model in the context of multiple regression. The expected value of the response variable, E(Y), is modelled as a linear combination of many (p, say) explanatory X-variables. A natural question is whether all p of the X-variables are required.
Forward selection begins by determining which one of the X-variables is most highly correlated with Y. This variable is retained in all future models. At the second stage the procedure considers the remaining (p−1) variables and determines which, in conjunction with the first variable, provides most additional information about Y.
Backward elimination mirrors forward selection by starting with the model containing all p X-variables and removing ineffective variables one by one (stepwise).
The order in which variables enter or leave the model may be determined using tests based on the F-distribution, with the critical values for entry or removal being termed the F to enter and the F to remove.
An alternative stepwise selection criterion, suggested in 2004 by Efron, Johnstone, and co-workers, uses least angle regression selection (LARS), in which variables are selected on the basis of their correlation with the currently unexplained variation in Y. A variant of LARS is the lasso (see multiple regression model). See also model selection procedure.