Good and bad modelling (#12)

We have now looked at the background of data modelling and where it all began. Clay tablets, lists, books and now databases. But behind all of these is the need to model the data in a way that makes it easy to use. In this way, data modelling is both a simple and complex area at the same time. Good data modelling makes it easier to collect good data and use it effectively.

10 Benefits of good modelling

Benefits of good modelling (and good data entry):

Data is easy to search for objects based on available information
It is easy to fetch the information wanted for individual objects
There is no need to follow long chains of connections to get the information required
Essential information is never missing
Fields with lists of values can include all possible valid values and cannot include any invalid values. As a result, such fields are all correct and consistent
There are no duplicated records
No data is duplicated in different columns
No records are missing because it was just too hard to bother with the data entry
No data is missing in records – missing because it was optional and people were in a hurry.
I hesitate to say it, but the database “feels” right. Table names and field names make sense. The data expected in each field is obvious

15 Problems of bad modelling

Bad modelling can make a database so difficult to work with that people will avoid using it.

All the benefits of good modelling shown above with have matching problems in a list of the results of bad modelling. In this list, some problems with bad modelling:

Excessive database size reduces performance and costs more for storage and backups. Storing unnecessary information is expensive in many ways. Examples of unnecessary storage include:
- extra columns or entire tables that store information which is never needed,
  storing individual items of information when the information is only ever used in an aggregated form,
- use of data types which take more space than necessary (such as storing numbers as strings instead of numbers) or
- duplicating information.
Too much work is required to fetch simple items of information (excessive normalisation can contribute to this). For example, you may need to fetch 20 rows from several different tables just to get one simple piece of information that you want.
Too much is fetched with each row – for example, you may only want one simple item of information, but the table has 150 columns which must all be fetched.
Data that ends up inconsistent or contradictory: for example, redundant information that is not kept up to date.
Orphaned data – information that should be associated with other data, but the connection has not been modelled.
Missing information – fields that should be mandatory but are not, and thus are not provided in all rows.
Missing information – information that is not modelled at all.
Useless or misleading information – entered because fields were mandatory although they should not have been
Multiple pieces of information stored in one field.
Meaningless, misleading or inconsistent table or column names can cause major errors in data entry or output.
Fields with fixed lists of values cannot be relied on at all if the list of values does not include all the possible values.
Fields which should have fixed lists of values but do not can contain unintended differences of spelling, punctuation or whitespace.
Spelling in name/address fields, or any field with text in it can cause unexpected differences.
Decimal points or leading zeroes? Don’t store numbers in text fields (particularly floating points numbers including scientific or engineering notation).
Floating point numbers can be problems when stored and read on different operating systems or architectures.

These last couple of items are not necessarily problems with bad modelling, but more with data consistency which are often reflected in difficulties with searching:

Is data entry, sorting or searching case sensitive? Unexpected results can occur.
Unicode (or other character encodings), whitespace, line break characters and similar things can reduce the ease with which data can be searched or sorted. Once again, unexpected results can occur.

Good or Bad?

Data modelling and databases fit the old proverb: “You will recognise them by their fruits.”[1]:

Bad data modelling and bad data together leave a really bad taste in your mouth – and will often lead to a cancelled job.
Bad data modelling makes good data hard to use.
Good data modelling makes it harder to collect bad data.
A good data model with good data tastes delicious. Both designers and users will be happy.

[1] From the Bible: Matthew 7:20