A widely employed method for predicting the output waveform from a linear system. It finds special application in modelling the acoustic waveform produced during speech, being used as a basis for speech coding, speech analysis, and speech synthesis; when used for speech coding it is known as linear predictive coding (LPC). Linear prediction relies on the fact that speech can be described in terms of an acoustic excitation waveform exciting the formants of the vocal tract. For those sounds in speech that have a pitch associated with them, such as the vowels in ‘feed’ and ‘card’, the formants are excited by a periodic acoustic waveform resulting from glottal airflow. This assumes a train of narrow pulses as the acoustic excitation, with an overall zero decibel per octave flat spectral shape, where each pulse results from a vocal-fold closure. The response of each formant to each individual pulse will be a sine wave at the formant frequency whose amplitude decays exponentially dependent on the bandwidth of the formant; each individual output sample from a formant can then be predicted mathematically from previous output samples.
In LP analysis of speech, a spectral estimate is made based on an all-pole filter (having only poles in its frequency response) that gives the minimum squared error when its output to a spectrally flat pulse waveform input is compared with the speech being analysed. LP speech analysis relies on five key assumptions: the ringing of the formants during voiced speech production is purely due to the most recent vocal fold excitation acoustic pressure pulse; the formant frequencies remain constant during each cycle; the formant bandwidths remain constant during each cycle; the vocal tract response can be completely modelled in terms of formants for all speech sounds; the acoustic excitation to the vocal tract can be modelled as being spectrally flat (0 dB per octave). In order to lessen the effect of these in practice, LP is usually carried out on input frames of 10–25 milliseconds in duration.
The error between the predicted speech and the input speech is known as the residual, which exhibits a large discontinuity at each excitation pulse as these are not predictable. When LPC is used as a means of coding speech for transmission, a more natural sounding output can be achieved by transmitting the residual as additional data to excite the LPC model in residual-excited linear prediction or residual pulse linear predictive coding (RPLPC). Another method for improving the naturalness of resynthesized LPC-based speech is by coding the excitation signal as a series of pulses of varying amplitudes during each cycle; this is known as multipulse linear predictive coding (MPLPC). Alternatively, the excitation signal for resynthesis can be selected from a stored codebook of Gaussian sequence with zero mean in code-excited linear prediction (CELP). A variation is vector sum excited linear prediction in which the excitation signal is reconstructed from linear combinations of stored vectors.