After a lexical database has been compiled on the basis of some corpus, it contains a certain quantity of entries that do not go into the final dictionary. In the production of a specialized dictionary, the first criterion of selection is, of course, whether the lemma belongs to the object domain. In the case of a general dictionary, criteria for the selection of entries for inclusion include the following:

The entry is

  1. principally in good order,
  2. a frequent word,
  3. native rather than a loan,
  4. standard rather than a minor variant,
  5. a word of all registers rather than a taboo word,
  6. based on a concept word rather than on a proper name,
  7. not inferrable by general rules of the language system,
  8. etc.

Ad 1: A dictionary requires a homogeneous, balanced entry structure. For instance, every verb must be specified for conjugation class. An entry can be published only if all the fields to be output have been properly filled in. However, at the end of every research project, some questions inevitably remain open.

Ad 2: A frequency threshold may be defined for a dictionary. Typically, hapax legomena are thrown out. In the definition of a basic vocabulary, frequency is the most important criterion.

Ad 3: Minority languages are often flooded by loans from the dominant language. In a bilingual minority community, every word of the dominant language is a potential word of the minority language. It may thus become impossible to separate out a subset of loans that should appear in the dictionary.

A radical solution is not to register any loan in such a dictionary. The solution is justifiable to a certain extent, since many users – speakers of the language in question just like linguists – will not consult the dictionary for non-native words.

However, some well-established loans will be inevitable in a bilingual dictionary whose direction is ‘dominant language – minority language’. For instance, in a Spanish-Yucatec dictionary, the commonest Yucatec equivalent for Spanish y ‘and’ is y. It would be inadequate to suppress this information. The necessity of mentioning a loan from language X in language Y in the dictionary ‘X-Y’ may then serve as a criterion for incorporating that loan as a lemma in the dictionary ‘Y-X’, or even in the monolingual dictionary of Y.

Ad 4: Formal (phonetic, phonological, orthographic or morphological) variants of a lemma may be mentioned in a subsection of the entry of that lemma. The problem then arises whether such variants should be elevated to lemma status themselves. If variant expressions differ exclusively in an expression feature, then the lexicographer concedes that variant a reference entry. For instance:

homogenous see: homogeneous.

Similarly, if not only the form to be lemmatized, but other inflected forms, too, differ from the standard lemma, then that morphological information may be specified in the corresponding fields of the variant entry; but otherwise it remains a reference entry, i.e. the other fields are filled in only in the main entry.

Other kinds of variation, like dialectal and diachronic variation, may be excluded from start by restricting the corpus accordingly or, if they do occur in the corpus, they may be excluded by defining the scope of the dictionary. If they are included, their affiliation must be marked.

Abbreviations (SPD), acronyms (Unesco) and clippings (prof) do not enjoy priority in many dictionaries.

Ad 5: A dictionary should be acceptable to its speech community. If those whose language is being described object to seeing an item in the dictionary, that item should probably be dropped.

Ad 6: Most dictionaries focus on concept words. Person names are mostly excluded systematically and left to encyclopedias. The problem then arises that words based on proper names may be concept words. For instance, a Duden dictionary of German, while lacking the lemma Schiller, does feature a lemma Schillerlocke.

Ad 7: A dictionary does contain complex (derived and compound) stems. However, some word-formation processes are completely productive and regular. For instance, an English dictionary cannot possibly list all phrasal compounds. Similarly, if it listed gerunds as lemmas, there would be a gerund entry for every entry of a verb. And it would be superfluous, since all the lexical and grammatical information associated with the gerund can be derived by rule. On the other hand, English dictionaries normally do contain deadjectival nouns in -ness, although their formation is highly productive and mostly regular, too. The relevant principle here is: If the formation of members of a certain class is completely productive and regular, the products are not registered as lemmas of a dictionary.