In any data modelling, the first, most significant question, is whether we are trying to improve an existing process or model a new process – a process which does not yet exist.  The approach to each is fundamentally different.

Modelling a process that has never been undertaken effectively by the customer (or the modeller) is much more risky.  Costs are much higher and cost overruns much more common.  Time taken for the process will be much greater.  Failure rates in such projects are much higher.

Rule: Be very careful when modelling new processes.  The probability of failure is high.

Let’s take the easy one first: improving an existing process.  This may mean converting a paper-based process to a digital system or it may mean combining a collection of existing digital processes (spreadsheets, documents), or some mix of the two.

Whatever it is, the customer at least knows what they have done in the past and they know what makes them want to change from continuing with that system.  They know the things which are causing them pain and will always have some ideas of how this pain can be alleviated.  However, they often won’t have a clear idea of how to get from the current situation to their idea of Utopia – at least, not practically.

To make sure you understand a customer’s requirements, always ask questions.  A specification is  a wonderful but impersonal thing.  Always ask questions of people, since it is always people who will approve work and authorise payment for a job.

Rule: Always ask questions.  Confirm requirements by talking to people.

Essential data

An existing, working system is a treasure trove for data modellers, as it provides a framework to start from.  It contains the elements that have survived the pressures of business; it represents the data that the business has successfully recorded and kept up to date.  None of this data should ever be discarded or made harder to access without very careful thought – in fact, it should probably never be discarded without explicit instruction.

Rule: If an existing business method maintains data, keep it.

Other features may be nice ideas, but they have not proved essential in the past.

There is a group of features that forms an exception to this: the features which have caused the pain to be great enough to force change.  You will often have been called to do some work because the pain caused by the current system is too great, or the expected cost of continuing with it (the monetary or other cost) is too high.  A new system must be designed to reduce or eliminate this pain.

Rule: Only model new data if the lack of it is one of the triggers for modelling the data.  Do not model any other new data or methods – they magnify the risk exponentially.

Of course, customers will often insist that you model lots of new data.  Try to make them aware of the dangers and cover the risk in your pricing and time estimates.  Because new data and methods have never been used or tested before, there is no clear “right” and “wrong” about how they should be collected and used.  As a result, this is one place where unexpected complications often arise.

So far we have come up with two categories of data which must be modelled:

  1. Existing data which is currently maintained
  2. New data or methods required to ease the pain of the existing systems

But what about…

There are also categories of data which probably should not be modelled, and some which positively must not be modelled.  Data which has been kept in the past and found not worth maintaining should never be modelled or saved.

Warning:  Make sure that everyone agrees the data has no value.  In large organisations, and even in small ones, it is common for different data to be used in different parts of the organisation.  One part of the organisation may think that data they do not use is of no use to anyone, but they may be completely wrong.  Never condemn data as useless unless everyone agrees it is not required.  As a proverb says: “The one who states his case first seems right, until the other comes and examines him.” (Proverbs 18:17)

 

Data for which no clear business use case or usage pattern can be established should not be modelled.  It is very easy to model too much.

Extra and Optional Data

Extra Data

You may ask “What does it matter if we model more data fields than we need?  Storage is cheap.”

True, storage is cheap, however storage is only a small part of the equation.  For data to be stored, it must first be entered.  For existing objects, the information must be found and explicitly entered or imported.  For all new records, the data must be entered, and that takes time.  Time is money and unnecessary time spent entering unnecessary data not only costs money; it generates enemies.  Avoid anything that will create enemies of a new system.  Remember that some things will always go wrong with a new system: time will be lost; people will be confused; extra money, effort and good-will will be required.  Don’t provide any more opportunities for creating enemies than you can help.

At times, customers or designers want to add extra columns as “optional extras”.  The argument will often go that this is data that is not needed yet, but will be used later.  In general, don’t include it.  Don’t teach users to ignore fields.  Don’t even let them get used to being able to ignore fields.  Aim for the goal that if a field is there, it must be filled in – or a mutually exclusive alternative field must be filled in.  In such cases, the interface design must make it clear that such options are mutually exclusive.  Any field which is enabled must be completed.  Extra optional fields are simply a source of confusion and bad habits.  New users will assume the data is there and searchable.  Experienced users will know that there is enough data missing that none of the data is useful.

Rule: Don’t store data that is not needed.

This is a very important rule and can make the difference between success and failure for a new system, so never forget it.

Optional Data

An optional field is an unreliable field.  It cannot be used for searching.  It may well be used to store completely different information which should be stored in a new (mandatory) column.

Rule: Don’t have optional data fields.

Unfortunately, history and cultural differences cause names and addresses to be exceptions to this beautiful and simple rule.  Within any given culture, people’s name divisions are fairly consistent, but naming methods vary enormously across the world, so fields which are essential in one culture are left unused in another.  If a database must support multiple cultures, optional fields and inconsistent usage will inevitably result.  Addresses are equally difficult.  As a result, effective searching in such fields is a complex process and searching tools will often need to work on a combination of these columns.

A minor caveat

Of course, all of these ‘rules’ are stated definitely, but there can be exceptions where they should be overridden.  For example, data may have been kept in the past and discontinued because it was not necessary, but now the law has changed and requires the collection of such data in new equipment or installations.