Given a corpus of texts C, then a concordance for a certain word form W (taken as a type) is the set of all occurrences (tokens) of W in C, arranged in a table such that

Derivatively, a concordance of corpus C is the set of the concordances of all word types Wi in C.

The following is a concordance of of in the first paragraph of the section ‘Reference entries’ of the page ‘Lemmatization’.

Concordance of of in one paragraph
referencepreceding contexttargetfollowing context
lemm 2.4, 01Lemmatization is a decision in favorofone form of an expression which
lemm 2.4, 01decision in favor of one formofan expression which is considered its
lemm 2.4, 04destined for users with imperfect knowledgeofthe language in question

The reference to the place of the token identifies

The context of the token in question must be limited in some sensible way. Even if, for many applications, it would seem desirable to reproduce the entire containing sentence, this would be too expensive and also unnecessary in the case of long sentences. In principle, one could reproduce some suitable construction (phrase) containing the token in question. However, that would require a human analyst, and that is undesirable not only for economical reasons, but also because the concordance is supposed to serve as a theory-free analytical tool in the first place: it does not presuppose the analysis of syntactic constructions, but instead helps do them on an empirical basis. Therefore the context in a concordance is mostly clipped mechanically, e.g. by limiting it to a certain number of text words at either side of the target. The user can always find the full context by following up the reference.

A concordance is based on a word list of a corpus, for which see the section on lemmatization.

Concordances may be produced for many different purposes. For the lexicographer, they show the range of contextual variation of each word form in his corpus. He needs that for the following analytic steps: