EPANET Network Components – Database tables and rows, Part 3 (#20)

In this case study we are going through the process of designing a database for storing the information contained in an input file for EPANET, software which can analyse water supply distribution networks.

Well, it has been hard work going through our simple EPANET data file and converting it to rows in a database where each table includes all of the information contained in a specific type of row from the input file.  It’s good that King Hezekiah didn’t have to wait until databases were available before actually building this tunnel to protect the city’s water supply against invading Assyrian armies!

Nevertheless, some very common questions have been answered: questions about keys, lookup tables, validation, and even a little about data types.

We have walked the tightrope across a chasm of inefficiency.  We have avoided falling off one side – the side where we simplify the data model so much that information is obscured or lost completely; while also avoiding the other side –  the side where we complicate the data model unnecessarily.

At this stage, we can model our network title, reservoirs, pipes and junctions.  In fact, all of the essential items defined in our data file have now been modelled, except for the flow units.  Let’s look at this before reviewing what we have achieved.

In the data file, the sections we have looked at have all had a consistent design.  Each line that was not a comment was translated into one row in the database.  Thus a three line network title was converted into three rows in our NETWORK_TITLE table.  Likewise one line in the [PIPE] section translated into one row in the PIPE table and so on.  The [OPTIONS] section is different: each line contains a key/value pair.  Our flow units is just one of many possible options that can be set, and the main reason that each option is set on a separate line is that these options all have default values.  A reservoir, junction or pipe has no “default” values, so each pipe must have a set of values defined; but the values of network options only need to be set when the default value is not appropriate for the network.

Here, again, we strike the common challenge in data modelling: the exact usage affects our modelling decisions.  The simplest modelling is to store each option we read as a key with its value.  Again, we need to link all of these rows to our network, so we need a foreign key for this.  Fortunately, though, the order of our options does not matter, so we have no need to store an order indicator, and our table ends up being:

Table OPTION

Column name Data type Nullable? Description
NETWORKID Foreign key No
ID of the network in the NETWORK table
NAME Character(17) No
Name of option (up to 17 characters since the longest possible option name is “DEMAND MULTIPLIER”)
VALUE Character(251) No
Value of the option (up to 251 characters, since a line can be only 255 characters long, the shortest option name has three letters, and a space is required to separate the name and its value)

Value of the option (up to 251 characters, since a line can be only 255 characters long, the shortest option name has three letters, and a space is required to separate the name and its value)

The data type of the VALUE column is specified as “character” because this is a data type which can accurately contain all of the possible values for different options – after all, these values are being read as strings of characters from a file.

However, some values lose some of their usefulness when they are stored as characters because their native type is actually a floating point or integer number.  This is not true of our UNITS option, but is the case for such options as VISCOSITY and TRIALS.  If we wanted to use the option values in arithmetic, we might need to convert the values into numbers, although in most databases, the automatic conversion would do all we needed.

For our use, mimicking the original storage by using a name/value pair is the best solution.  This would make populating the database from an input file easier, and would also make the round trip of writing the data to an input file easier.

Since the EPANET documentation defines all of the options which can be set, we could create a table with a column for each possible option.  This might be a practical proposition if we were trying to create an editor or network modeller, particularly if the default option values can be edited.  In such a case, all options would be set to the current default values when the network was created.

With either usage, we could also store our default option values in the same table, if we wished, by creating a network in the NETWORK which would represent a default network.  This NETWORKID would then be used for all of the default values in the OPTION table.

Database Rows

For our single option, the row we would need is shown below (for metric units):

Table OPTION

NETWORKID NAME VALUE
1 UNITS LPS

In Imperial/US units, since US gallons per minute are the default unit, we could do without setting any option.  In our example input file, however, we specified the option anyway so the necessary row would be (Yes, I know that goes against my constant “keep it simple” refrain, but this is just an example!):

Table OPTION

NETWORKID NAME VALUE
1 UNITS GPM


Next week, God willing, we will wrap up this case study with looking at simulation times for our model.  Specifying times is done in a similar way to setting options, but there are differences that highlight some more fascinating options in modelling.