A very common problem with databases is missing data. Missing data makes searches difficult (or impossible) and statistics unreliable.

Two common reasons for this are:

  1. Records contain optional fields which have been left empty.
  2. Records are completely missing because some of the mandatory information was not available. Since the missing data was in mandatory columns, the entire row was discarded.

Modelling mistakes?

Sometimes data can be missing because of mistakes in the modelling such as:

  1. Tables had optional fields which should have been mandatory. The fields have been left empty in some records – this can happen for a variety of reasons from accidental omission to laziness.
  2. Tables with mandatory fields that should have been optional, but now contain useless or misleading information in some records because something had to be entered to satisfy the requirements of mandatory columns.

These conflicting reasons highlight the dilemma between making fields optional or mandatory.

The following ideas need balancing:

  • Don’t store data that is not needed.
  • If information is optional, is it worth storing at all?
  • If information is necessary, always make the field mandatory.

An optional field is an unreliable field. It cannot be relied on for searching or statistics. It may well be used to store completely different information which should be stored in a new (mandatory) column.

Special cases: names and addresses

Cultural differences mean that fields for names and addresses will often have to be optional. Within any given culture, the components of a person’s name are fairly consistent, but naming methods vary enormously across the world, so fields which are essential in one culture are left unused in another. Addresses are just the same. If a database must support multiple cultures, optional fields and inconsistent usage will inevitably result.

As a result, effective searching in name and address fields is a complex process and searching tools will often need to work on a combination of these columns.