One of the principal ways of specifying an infinite formal language by finite means. A grammar consists of a set of rules (called productions or rewrite rules) that may be used to derive one string from another by substring replacement. The strings of the specified language are obtained by repeated application of these rules, starting from some initial string. A grammar however has the additional feature that the alphabet is divided into a set T of terminal symbols and a set N of nonterminal symbols (or variables). While productions may be composed arbitrarily of terminals and nonterminals, the specified language contains strings of terminals only.
A grammar G can therefore be defined as comprising two sets of symbols T and N, a semi-Thue system over the union T∪N, and a distinguished member S of N. The language generated by G is the set of all strings over T that can be derived from S by a sequence of substring replacements (see semi-Thue system); S is known as the start symbol or sentence symbol. As an example, let T be {b,c}, N be {S,A} and let the productions be
Then, for instance, starting from
S we can derive
bcbcbc via the following sequence (among others):
SA by production 1
SAA by production 1
AAA by production 2
bcAA by production 3
bcbcA by production 3
bcbcbc by production 3
The language generated is
These are the only strings of
bs and
cs in {
b,c}
* derivable from the start symbol
S by the three production rules. A string such as
SAbcA, which is derivable from
S but still contains nonterminals, is referred to as a
sentential form.
This is the most general form of grammar. Typically however some restriction is placed on the form that productions may take (see context-free grammar, context-sensitive grammar, regular grammar). The syntax of programming languages is usually specified by context-free grammars; the example given above is context-free, although the language can be specified by a regular grammar.
A slightly different way of generating a language is by means of an L-system (or Lindenmeyer system). A different approach altogether is to define a machine that tests any string for membership of the language, i.e. an automaton.