Database Design - Why Does It Matter
Table of Contents
Why Does It Matter?
As hinted in the previous post, many people do not take database design seriously when first implementing their database idea. These people mostly have the “Let’s just get it done, and we will deal with performance later…” kind of mindset. This is definitely the wrong approach to database design. While the database may be fine for a short period or a small group of people, if you are truly looking for a scalable environment this can be a major problem.
Another excuse I see for my colleague is that most of the time, people are only building databases for the semester or a final project. This can also be the wrong mindset, for example, let’s say that database comes into play for another course or you actually would like to build a database for a real purpose, such as organizing your music. In this situation, you would also fall into the trap where people think that databases are for a short term purpose and not intended to last very long.
Three Critical Pillars
There are three things that are critical for building a reliable and robust database:
- Know your data – Helps you properly choose what data types to use. For example, if you need to record a date, do you need the time?
- The
datedata type is only 3 bytes - The
datetime2data type can range from 6 to 8 bytes
- The
- Know your workload – This can help you plan what tables should store the data
- OLTP databases vs Reporting databases: Reporting databases tend to store more information in one table for faster reads as they do not need to insert data as often. While OLTP databases store less data in one table to facilitate a more balanced workload.
- Know how the platform works – This is key; it makes it easier to design a database as you will not only know the syntax but you can understand the difference between a clustering key versus nonclustering keys/indexes.
The Byte Savings Add Up
A good example of why this matters is the 3 byte versus 6-8 byte argument I mentioned above. Most people would shrug off a 3 or 5 byte difference as it is so small, however, you are not only dealing with one row or in some situations one column.
In this scenario, let’s say that you have a table with 3 datetime2 columns that are defaulted to a precision of 7 (Datetime2(7)) and are always populated, however, you only need to store date which uses 3 bytes. In this case you would be saving 15 bytes per row.
| Bytes Saved | Count of Rows | Space Saved |
|---|---|---|
| 15 | 1,000,000 | 15 MB |
| 15 | 10,000,000 | 150 MB |
| 15 | 100,000,000 | 1.5 GB |
| 15 | 1,000,000,000 | 15 GB |
So now you would respond with, “So? It’s just disk space?” That is not true, due to the way SQL Server handles pages and data it is now read into memory. So you now have bloat on your RAM due to inefficient data types. Not to mention, you have to include logs, database backups, replications, and high availability environments.
However, this is not something you would want to choose too close to the mark. If you think you are going to approach the limit of a data type, such as a tinyint (0 to 255), make sure you use smallint (-2^15 (-32,768) to 2^15-1 (32,767)).
The point I am trying to make here is take your time in choosing the proper data types. These data types are very difficult to change in the future, especially once the code has been written.