In this post we are trying to come to some conclusions about the different sub-classing options in Django that have been discussed in previous newsletters. Due to the long delay between posts, I have listed the options again below as a reminder.
The three options
Once again, the three options are:
Option 1 – A single roads table
A single table called “ROADSA” contains the freeway and tollroad attributes as well as the common road attributes. Unused attributes for freeways or tollroads are set to NULL or similar values.
Option 2 – A table for each type of road
A ‘FREEWAYB’ table and a ‘TOLLROADB’ table are defined, each of which contains both shared road attributes and some attributes specific to the type of road.
Option 3 – Shared base table
The “road” information is stored in a ‘ROADC’ table, while the information about the freeway aspect of freeways is included in a ‘FREEWAYC’ table and the tollroad-specific aspects of tollroads are stored in a table called ‘TOLLROADC’.
Review and Conclusions
In previous posts, we have looked at the creation, selection, modification and deletion of entities in small and large numbers. Naturally, there are many disclaimers to add about the simplicity of our classes, the methods used for creation, selection, modification and deletion, the database manager used, and many other things. However, we can still draw some conclusions as long as we remember that changing the methods may also change the results.
So let’s look at the operations in order and draw some simple conclusions.
The first and most obvious conclusion to draw is that if there are large numbers of objects to create, the default process provided by Django will be slow. Very slow. For the hardware and software I was using, the speed was roughly 8 new objects per second. The performance only varied by about 2% between Option 1 (fastest) and Option 3 (slowest).
Calling the bulk_create() function will be more than 1,000 times faster, with the example from the previous newsletter creating about 9,000 objects per second.
Of course, this extra speed is achieved by doing less, so it can only be used if the extra work is not needed. If your process requires the save() method to be called or the pre_save or post_save signals to be sent, or a few other particular features of the default object creation process, then you cannot use bulk_create(). See the documentation (https://docs.djangoproject.
One limitation is that bulk_create() will not work with the child tables in a multi-table inheritance scheme such as our third option uses.
However, there are other options for improving performance when creating many objects at once, and one of them is to create all of the objects in one database transaction (see https://docs.djangoproject.
To summarise: if small numbers of objects must be created, any of the three modelling options can be used. If large numbers of objects must be created, the multi-table inheritance option (Option 3) will be considerably slower. The performance when creating objects in our larger test varied only by about 1% between Option 1 (faster) and Option 2 (slower).
Various different conditions were used for filtering the roads and fetching the rows we wanted. Our initial test working with 100 roads of each type showed very short times for all three of the different modelling methods, but suggested that Option 2 was the best performer, followed closely by Option 3, with Option 1 a more distant third.
Our larger test works with 500,000 roads in each modelling option and reveals the same performance order and very similar percentage differences.
Option 1 takes a little more than 50% longer to fetch the same number of rows as the roads modelled using Option 2. Option 3 is slightly slower than Option 2, but still significantly faster than modelling with Option 1.
If you are looking just for performance, these tests suggest that modelling different types of objects in different tables could be quicker to work with – but that will also depend on how many different types of similar objects you have.
Option 3, which uses multi-table inheritance, is slightly slower than Option 2, but still considerably faster than Option 1 as used in this test. Although this method of modelling adds some complexity in some ways, it also makes other aspects considerably simpler. The overall performance of your application will depend on the actual usage, but if your application is similar to this test case, Option 3 may be a good solution – particularly if you have several similar sub-classes.
Our initial tests showed Option 2 to be the fastest for our editing, with Options 1 and 3 being about 11-13% slower.
When editing with larger numbers of objects and editing larger numbers of objects, the order of the options is still the same, but the difference in performance is only 1-2%.
No useful advice can be given about choosing one modelling type over another based on such small differences. The differences between application usage and the actual data being modelled will cause much greater variation in performance than this.
The testing of deletion has not been as thorough as the other operations, and deletion is often a little specialised. For many applications in Django, deletion is a rare event. If deletion is used, it will often have special conditions attached to it which will significantly affect performance.
Based on the simple tests done, Option 1 was the fastest, with Option 3 taking about 15% longer and Option 2 110% longer than Option 1.
Clearly, for these tests, Option 1 is best, with Option 3 a little way behind.
If deletion is a common and important part of a Django application, more planning and testing would be required. Some customised handling may also be required in situations where there are many possible connections between objects.
Overall, the modelling option chosen for sub-classing does have a significant impact on performance.
For each operation, the table below shows a scaled value for each option; the fastest option being assigned a value of 1, while the other two options are given values that indicate how much longer they took proportionally.
The one incongruous value is for the creation of many objects with the multi-table inheritance modelling (Option 3), since it is using a different creation method. The numbers in each line are summed for each option and the total number for the seven tests is given. Lower is better.
The numbers in brackets are the sum of the values for each operation excluding the bulk creation step. If this step is excluded, Option 3 becomes the lowest total overall.
The main weakness for Option 1 is in selection.
The main weakness for Option 2 is with deletion.
The main weakness for Option 3 is that bulk_create() cannot be used.
Subject to the normal disclaimers about uncertainties, variations, errors in methodology and analysis, and any changes due to software updates, we can find an overall winner – purely from a performance point of view.
Option 2 has the largest number of simple “1” values, indicating that it won the most contests. In fact, the only area in which Option 2 performed badly was deletion. If deletion is ignored, this method of modelling clearly out-performs both of the others.