A frequency dictionary (or word frequency book) is a set of word lists that provides each lemma with statistical information of various kinds. It commonly comprises at least two lists of entries:

Methodologically, a word frequency book differs from all other kinds of dictionaries in that it is necessarily and exclusively corpus-based. Apart from lemmatization, most of its confection is automatic.

Another difference between most frequency dictionaries and other kinds of dictionaries consists in the nature of the lemmas. As explained in the section on lemmatization, the process that relates text tokens to lemmas of a dictionary leads over various steps of abstraction. One of the lowest of these is the word form (or inflected form). While most other dictionaries proceed further in relating word forms to lexemes, most frequency dictionaries abide at the level of word forms. An English frequency dictionary, e.g., would give the user the frequencies of each of the forms am, are, is, was, were, be, been, being, but it would not tell one the frequency of the lexeme be in the corpus; that the user would have to calculate for himself by adding the component frequencies. As a consequence, one can easily compare the frequencies of the forms is vs was; but it would be more cumbersome to compare the frequencies of the lexemes be and have.

In former times, the separate appearance of inflected forms in frequency dictionaries was doubtless due to the desire to automatize the compilation of the dictionary. Nowadays, rather powerful lemmatization programs are available, which can assemble the different inflected forms of a lexeme under one entry, paired with the individual and with the lump frequencies.

Frequency dictionaries play a certain role in the elaboration of other kinds of dictionaries: Lemma selection partly follows frequency, and the description of uses of words, e.g. of senses, collocations, constructions, is also mostly restricted to the more frequent uses. Last not least, the probability of a linguistic element is the basis of the calculus of its information value, which has an interesting relationship with its meaning (see Lehmann 1978[measuring]).

Bibliographical references

