EPANET Modelling in WaterSums - Part 2 (#61) - Data modelling for developers

Different applications have different requirements. This seems to be an obvious statement, but sometimes it is not obvious just how different the requirements can be.

Many Django applications are news sites which tend to accumulate objects. For example, news articles are created and shortly afterwards they may by edited, but they are only rarely deleted. When articles are archived, it is likely to be done as a batch process every so often as part of site maintenance. It may well be done quite independently of Django – behind its back, so to speak.

For a news site, the most important features are to do with selection and retrieval of data.

Lists of articles grouped by age, category or search terms must be able to be presented quickly, often with graphics and text interspersed.

Individual articles must be very quick to load and use minimal bandwidth, particularly if the user is accessing the site through a slow connection.

Snippets of connected articles may also be presented which again requires fast selection of data.

Users may be allowed to comment on articles, in which case comments must be displayed. On some articles, thousands of comments can accumulate, so it can be best to load them only on demand or in groups. Not only so, but comments will normally have an order and a hierarchy, so fetching can be complex if the modelling is not done well.

Overall, the main requirements of such a news application will be:

Creation of individual objects (articles and comments)
Editing of individual objects (updating articles and moderating comments)
Selection (searching for articles in categories or in response to ad hoc queries; finding related or highlighted articles; fetching comments for articles)
Comments are the only things likely to be commonly deleted.
Archiving of articles is likely to be periodic or rare.

Storing information about water supply network models in a database is different in several ways and these affect our modelling decisions.

Firstly, in simple terms, a news site is a single entity. Although individual users may be able to customise the appearance of the site to suit their preferences, the overall data for the entire site will be stored in one central location.

To explain more fully what I mean, a news site contains many different categories of news reported by many different journalists, but there is not a separate site for each journalist, or for each category of news.

However, within a given engineering firm or utility company, there may be several people working on network models from different geographic locations and owned or controlled by different utility companies. Input data can easily be available in different forms and different design rules and defaults may be used. Basically, data is only loaded when it is needed.

In general, this application is best thought of as a standalone design tool for each user rather than a centralised design tool. You might consider it similar to having one person using a word processor to work on a document.

Having a centralised document repository within a company can be useful if several people need a document or simply for backup purposes, but this comes with greater complexity and a need for permanent, speedy access to a network.

Often there will be a need for users to work offline. This is easy for a single document, but hard to achieve if we need to mirror a site with thousands or hundreds of thousands of news articles.

EPANET is intended as a standalone program that can be used to model a water supply distribution network or a section of one. If we want to store EPANET data in a database while allowing users to work as they would with EPANET, our modelling must facilitate speedy:

loading of the data for a distribution network from a file.
deletion of all data relating to a network modelling project.
adding, editing and deleting of individual objects as part of the design process.
searching for existing objects in the database.
analysis of a distribution network stored in the database.

From this list we can see that our requirements are quite different from those of the typical news site we have described above.

In previous posts, we have looked at the options for sub-classing in Django and how each option affects performance. We will now revisit those conclusions to see how we should model EPANET data using Django.