Using a Database

Data modelling - chickens and eggs: "Abstract chicken and egg" by GDJ ( last post looked at what we should model, and the importance of scale and magnitude. Now we need to acknowledge that the data we model will normally be stored in a database. Modelling is the mapping between the real world and the filtered picture of the real world that we wish to present efficiently from a database.

What can go wrong?

Data modelling is not always easy. There are so many different ways to go wrong:

  • Too many attributes may be modelled, or information may be left out that is badly needed.
  • Incomplete lists of possible values for a field, or duplicated (but subtly different) values.
  • Incorrect storage types – should this one be an integer or a real number? Should there be 2 digits after the decimal point or 3?
  • Control of the content of fields can be too flexible or too restrictive.
  • Information can be broken down into too many pieces, or not enough.
  • The same information can be stored in more than one place – and will normally end up different!
  • Information which is vital is missing simply because it wasn’t a mandatory field, while all sorts of unimportant nonsense can creep in and fill up your database.
  • Even bad spelling can make good modelling useless.

The truth can hurt

So let’s start with some simple truths.

  • How we model our data must depend on how we expect to use the information.
  • How we are able to use the information in a data set will depend on how we have modelled the data.

If our expectations are wrong or unclear, we are unlikely to be successful in our modelling. As a result, our data set will probably be incomplete or inefficient. Once data has been entered into a data model, the data can only be used in ways which the model allows. We cannot search for non-existent data, nor can we search quickly if simple and frequently used searches require us to fetch every row in a multi-terabyte data store.

Give up or get moving

These conflicts can sometimes lead to paralysis. “What can I do?” you might ask, “I can’t model the data until I know how I will use it, and I can’t know how I will want to use it until the data is actually stored in the data model.” History reminds us that only God truly knows the end from the beginning. In this newsletter, however, we will try to provide some ways forward; paths through the jungle that, with reasonable care, will reliably take you where you want to go. Strategies that work time and again.

So let’s step right back to some very simple modelling, and the next post will set some ground rules.