Vertical Partitioning
Table of Contents
Introduction
Data density is something that was hinted at in the last three parts of this series, however, it wasn’t totally discussed. Data density put simply is how many rows you get per page and how much space is left over on each page. It was discussed that a page can only contain up to 8060 bytes. This means that if a row is fixed width for 5000 bytes in a row, you could only have 1 row per page and essentially 3060 bytes wasted on every page. This can result in a huge waste of space in the data files.
A good practice to avoid wasting space is using vertical partitioning. This practice has 2 main categories:
- Normalization
- Row splitting
For the focus of this article we will be focusing on Row splitting. Put simply, this is the process of dividing out columns that are not used often OR that are large columns that could potentially be stored into another table.
Since these values are stored in another table it does make the queries a bit more complicated to write as you are slightly de-normalizing your data especially if this is used often and is just a large value. However, if the column is not used that often this can be very useful not only for storage but also for query optimization.
Demo Setup
To start off, we are going to demo a table that has not been vertically partitioned.
| |
From here we will insert 1 million rows, I believe this to be a realistic size for a person or product table.
| |
Now before we analyze this data, lets see how it would look if we move the large VARCHAR (1000) column that is always populated off to another table.
| |
AND lets also insert the 1 million rows same as previously inserted.
| |
Now lets used the same query from the previous post to view the row structure of the table with no vertical partitioning.
| |
Note that the row count is 1000200. This is due to the fact that we only took a “sampled” view of the pages, we will need to view the “detailed” in order to get an exact count. As previously noted, since we have more then 10,000 pages, the sampled view will not to a detailed view automatically.
| |
There we go, exactly 1000000 rows.
Now lets see how our performance looks, but first to make sure we are starting from a clean state lets clear our cache using the following script.
| |
Now lets run a query to return only the even rows that we had inserted based off our primary key value. However, prior to running the following query lets also turn on our “STATISTICS IO ON” this helps you to understand how your query performed shows you what actually happened.
| |
Once we click on our messages tab we will see the following:
(500000 row(s) affected) Table ‘NOVerticalPartitioning’. Scan count 1, logical reads 166674, physical reads 0, read-ahead reads 166473, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Now for the Vertically partitioned tables, lets first look at the row structures.
| |
Finally lets see how our query performs.
| |
Results Comparison
Now lets compare what we have gathered thus far:
| Metric | Vertically Partitioned | NOT Vertically Partitioned |
|---|---|---|
| Total pages | 154018 | 166674 |
| Selecting all columns - Scan Count | 10 | 1 |
| Selecting all columns - Logical Reads | 193764 | 166674 |
| Selecting all columns - Physical Reads | 4659 | 0 |
| Selecting all columns - Read-Ahead Reads | 193764 | 166674 |
Well it looks like we have now saved 12,656 pages, which is excellent! It saved us just over 101MB, however, if we were always trying to return the last column, this would definitely not be worth it for performance reasons. However, if we weren’t always selecting all of the columns, what would that look like? Would that make the difference for us and make it worth our time?
To find out, lets run the following query and analyze the out puts.
| |
| Metric | Vertically Partitioned | NOT Vertically Partitioned |
|---|---|---|
| Selecting Column2 and Column3 - Scan Count | 1 | 1 |
| Selecting Column2 and Column3 - Logical Reads | 11154 | 166674 |
| Selecting Column2 and Column3 - Physical Reads | 3 | 0 |
| Selecting Column2 and Column3 - Read-Ahead Reads | 11150 | 166674 |
It made a huge difference for performance, we now no longer had to look at that third column and increased our performance by almost 15 times.
Conclusion
In summary, vertical partitioning is a good design practice and should be used, however, it does depend on your workload. Do not throw columns to another table just because they are large objects. Instead, take the time to evaluate your data and make the right choice when it comes to what table the column should be under.