[{"title":"PVE Resize Disk","description":"Extending the local-lvm disk size on a proxmox server","content":"I just recently updated my local proxmox server volumes moving from a 500GB consume spinning disk to a 1TB enterprise SSD and here is the how and why of the process.\nWhat was the problem? The reason for this change was to solve a few issues I was having with the ISCSI target I was using for my VMs. Initially, I had my VMs all running from a single ISCSI target, which was fine for a while, but when I wanted to backup my VMs on proxmox, I was maxing out my 1Gbps network connection. This was causing my VMs to be unresponsive during the backup process which in turn caused my VMs to be corrupted.\nSo to solve this, I moved my VMs to local storage on my proxmox server hosts. In the future, I will be taking a look at moving back to the ISCSI target, but I will need to upgrade my network to 10+ Gbps to ensure that I can backup my VMs without causing any issues.\nIn addition, I also moved to proxmox backup server, with a 1TB nvme drive mounted on the host and passed through to the VM. This allows me to backup my VMs without causing any issues with the VMs themselves. I moved to proxmox backup server as it has a deduplication feature that allows me to save space on my backup server and allows me to capture deltas of my VMs. This reduced my backup size by a factor of 30. So all of my homelab stuff now only takes up 55.55GB when backed up and I reduced the amount of time it takes from 130minutes to 3 due to the delta backups.\nWhile it is now possible to move back to ISCSI with the proxmox backup server in place, if I add more VMs to my homelab it is possible that the initial backup could still max out the network connection, so I am going to avoid that for right now.\nWhy Enterprise SSD? The reason to move to an enterprise SSD is really simple, it is more reliable than a consumer SSD. The enterprise SSD has a higher write endurance, which means that it can handle more writes before it fails. The ssd that I landed on was a SM863a, which has a write endurance rating of 6,160 TBW. This means that I can write 6,160 terabytes of data to the drive before it fails since I landed on the 960GB model. This means that basically, this drive will never fail in my homelab.\nSecondly, because their drive endurance is so high, you can find them on the used market for a fraction of the cost of a new consumer SSD. I was able to pick up 6 for $360.\nThe process Anyways, back to the topic at hand, Instead of reinstalling proxmox on these machines, I used a sabrent device to clone the spinning disk to the SSD. This was a simple process, but it did take some time to clone and then resize the disk.\nFirst let\u0026rsquo;s clone the disk, I used a sabrent device to clone the disk, but you can use any disk cloning software to do this.\nOnce the disk was cloned and installed in the sever, ssh into the proxmox server and run the following commands:\n1 2 sudo -i apt-get install -y cloud-guest-utils When finished, list out the block devices on the server with the following command:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 894.3G 0 disk ├─sda1 8:1 0 1007K 0 part ├─sda2 8:2 0 1G 0 part /boot/efi └─sda3 8:3 0 464.8G 0 part ├─pve-swap 253:0 0 8G 0 lvm [SWAP] ├─pve-root 253:1 0 96G 0 lvm / ├─pve-data_tmeta 253:2 0 3.4G 0 lvm │ └─pve-data-tpool 253:4 0 337.9G 0 lvm │ └─pve-data 253:5 0 337.9G 1 lvm └─pve-data_tdata 253:3 0 337.9G 0 lvm └─pve-data-tpool 253:4 0 337.9G 0 lvm └─pve-data 253:5 0 337.9G 1 lvm We can see that the sda disk is 894.3G and the sda3 partition is 464.8G. We need to resize the partition to the full size of the disk. To do this, run the following command:\n1 2 3 growpart /dev/sda 3 CHANGED: partition=3 start=2099200 old: size=974673935 end=976773134 new: size=1873285775 end=1875384974 If we run the lsblk command again, we can see that the partition has been resized:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 894.3G 0 disk ├─sda1 8:1 0 1007K 0 part ├─sda2 8:2 0 1G 0 part /boot/efi └─sda3 8:3 0 893.3G 0 part ├─pve-swap 253:0 0 8G 0 lvm [SWAP] ├─pve-root 253:1 0 96G 0 lvm / ├─pve-data_tmeta 253:2 0 3.4G 0 lvm │ └─pve-data-tpool 253:4 0 337.9G 0 lvm │ └─pve-data 253:5 0 337.9G 1 lvm └─pve-data_tdata 253:3 0 337.9G 0 lvm └─pve-data-tpool 253:4 0 337.9G 0 lvm └─pve-data 253:5 0 337.9G 1 lvm Next run Detect any changes in the size of the underlying partition or disk that sda3 resides on and then resize the logical volume. To do this, run the following commands:\n1 2 3 4 5 6 7 8 9 10 11 12 root@proxmox-01:~# pvresize /dev/sda3 Physical volume \u0026#34;/dev/sda3\u0026#34; changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-aotz-- 337.86g 0.00 0.50 root pve -wi-ao---- 96.00g swap pve -wi-ao---- 8.00g root@proxmox-01:~# lvextend -L+300G pve/data Size of logical volume pve/data_tdata changed from 337.86 GiB (86493 extents) to 637.86 GiB (163293 extents). Logical volume pve/data successfully resized. Now, if you log into the proxmox web interface, you can see that the disk has been resized to the 637.86G that we specified. I didn\u0026rsquo;t want to extend to the full drive as it is always good to have some extra room in case needed.\nI hope this helps you on your homelab journey!\n","permalink":"/p/resize-local-lvm/","date":"May 19, 2024"},{"title":"Building this site","description":"My guide for how I spun-up this website.","content":"When I started on my YouTube journey, I was excited to produce content about my home lab configuration and share some of the things I\u0026rsquo;ve learned along the way. However, I wasn\u0026rsquo;t fully prepared for how many people messaged me about written documentation to go along with the repositories. In hindsight, this is a no-brainer and was also highlighted in the Go Developer Survey 2024 H1 Results, the primary way people learn is through a written document.\nSo, let\u0026rsquo;s talk about how I spun-up this website and how you can do the same.\nWhy Hugo? I chose Hugo because it\u0026rsquo;s a static site generator meaning that there is no database or server-side code to run. This makes it very fast and easy to deploy. I also chose this because I have experience managing Hugo documentation sites at work usually using the Docsy theme or the Hugo Book theme. However, those seemed a bit too bland for my personal site. I wanted something that was flashier and had more color (not the norm for me) to it and I ended up choosing the Stack theme.\nHowever, I didn\u0026rsquo;t end up using the theme as is or at least with their starter repo. Instead, I ended up using the quick-start documentation from Hugo to allow me to customize the theme to my liking.\nInitial Configuration For the initial configuration, I followed the quick-start guide from the Hugo documentation. This allowed me to get a basic site up and running with the stack theme. Using the theme as a submodule allowed me the capability to override the theme\u0026rsquo;s default layouts and styles with my own. For example, removing the \u0026ldquo;Built with Hugo\u0026rdquo; footer and replacing it with my own footer.\nIf you got this running, you should be able to see the site running locally by running hugo server and navigating to localhost:1313 in your browser.\nCustomizing the Theme So, let\u0026rsquo;s talk about what I did to make everything work as well as it does.\nCopy over the configuration files located in the theme\u0026rsquo;s starter repo directory to the root of my site. This allows me to take advantage of the parameters that the theme has set up. When you copy these over you will need to also remove the config.toml file that is in the root of the directory. Make sure to modify any of the configurations that you want to change and add theme = 'hugo-theme-stack' to the config.toml file. For me, this included: social media links in the menu.toml file title in the config.toml file sidebar subtitle in the params.toml file Find the layout file for the footer and remove the \u0026ldquo;Built with Hugo\u0026rdquo; text. This is in the theme\u0026rsquo;s directory under layouts/partials/footer/footer.html. I copied this file to my site\u0026rsquo;s directory under layouts/partials/footer/footer.html and removed the text. This allows you to customize the layouts of the theme to your liking without having to modify the theme itself. This can be done with any of the theme\u0026rsquo;s layouts. Finally, for my sanity, I also changed the folder structure of the content directory so that the posts were organized by year, month, and day. This is a personal preference and not necessary for the theme to work. If you have been following along, this is what your directory structure should look like:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 ├── archetypes │ └── default.md ├── assets │ └── ... ├── config │ └── _default │ ├── _languages.toml │ ├── config.toml │ ├── markup.toml │ ├── menu.toml │ ├── params.toml │ ├── permalinks.toml │ └── related.toml ├── content │ ├── _index.md │ ├── categories │ │ └── hugo │ │ └── _index.md │ ├── page │ │ ├── archives │ │ │ └── index.md │ │ ├── links │ │ │ └── index.md │ │ └── search │ │ └── index.md │ └── post │ ├── 2024 │ │ └── 05 │ │ └── 07 │ │ ├── cover.svg │ │ ├── index.md │ │ └── learning_content_exp.svg ├── layouts │ └── partials │ └── footer │ └── footer.html ├── static │ ├── favicon.png ├── themes │ └── hugo-theme-stack │ ├── ... Comments Now that your site is up and running and you have customized the theme to your liking, you can add comments to your site. By default, the stack theme uses disqus for comments. You can use this or modify it to use another service by modifying the params.toml file in the root of your site.\n1 2 3 [comments] enabled = true provider = \u0026#34;disqus\u0026#34; I switched over to using giscus which is heavily inspired by utterances, but instead uses GitHub discussions instead of issues. I found this to be a bit more user friendly and easier to manage. To do this, I added the following to my params.toml file:\n1 2 3 [comments] enabled = true provider = \u0026#34;giscus\u0026#34; and then after completing the setup on the giscus website, I added the filled out the [comments.giscus] section of my params.toml file.\nThis now allows me to have comments on my site without having to rely on someone showing ads or tracking my users. Also, since disqus sells your data to advertisers, I didn\u0026rsquo;t want to support that. opt out of disqus data sharing\nSitemap I also adjusted the default sitemap, the main reason for doing this was to adjust the priority and change frequency of the posts. This is done by creating a new file called sitemap.toml next to your other configuration toml files.\n1 2 ChangeFreq = \u0026#34;daily\u0026#34; Priority = \u0026#34;1\u0026#34; While you can get way more complex in your setup by generating a custom sitemap.xml file, I found this to be sufficient for my needs at least now. A good write up on how to do this can be found here which is from the Hugo forums. This also talks about the robots.txt file but we will be using http headers on our deployment to handle this.\nDeployment While there are many choices for your hosting provider, GitHub pages, Netlify, and Vercel are just some to name a few. I ended up choosing Cloudflare pages. This might change in the future to a self-hosted container running nginx or lighthttpd, but for now, I\u0026rsquo;m happy with the performance and the ease of use that Cloudflare pages provides.\nCloudflare pages are completely free and provide a global CDN for your site. This means that your site will be served from the closest datacenter to the user. To deploy to Cloudflare pages, you will need to follow the hugo framework guide from Cloudflare. This will walk you through setting up your site on Cloudflare pages and connecting it to your GitHub repository. Once you have set this up, you can push your changes to your GitHub repository and Cloudflare pages will automatically build and deploy your site.\nHowever, there are a few things that the framework guide doesn\u0026rsquo;t cover that I had to figure out and tweak to get my site to work the way I wanted it to.\nHeaders file By default, Cloudflare will serve your pages with a noindex header. This is to prevent search engines from indexing your site. So, to correct this we will need to create a _headers file in a new directory called Cloudflare in the root of your git repository:\n1 2 3 4 5 6 7 8 9 / X-Robots-Tag: index, follow /static/* Access-Control-Allow-Origin: * X-Robots-Tag: nosnippet https://myproject.pages.dev/* X-Robots-Tag: noindex Make sure to replace https://myproject.pages.dev/* with your project\u0026rsquo;s dev URL, if not bots will be able to index your dev site. This can be found in the Cloudflare pages dashboard, underneath the project name and repository name.\nBecause I have no secure content on my site, I also added the Access-Control-Allow-Origin: * header to allow for cross origin requests.\nThe reason I put this in a Cloudflare directory is because I have my public directory in my .gitignore file and I don\u0026rsquo;t want to worry about accidentally committing the locally built files to my repository and instead rely on Cloudflare to build and deploy my site.\nThis means we will need to modify the build parameters in the Cloudflare pages dashboard, select your site, go to the settings tab. Under the build and deploy section, click on edit configurations in the build configurations section.\n1 2 Build Command: hugo -b $CF_PAGES_URL -d ./cloudflare Build Output Directory: cloudflare So now when I build locally to view html changes it will go to my git ignored public directory and when I push to GitHub it will build in the Cloudflare directory.\nWhile here, I also removed the preview deployments as I don\u0026rsquo;t need a development build every time I push to my repository. This is done by setting Configure Preview Deployments to None. This reduces the number of builds you run by half and since your limited to 500 builds a month, this can be important.\nCustom Domain Finally, the last mandatory configuration in my opinion was to set up a custom domain.\nLog in to the Cloudflare dashboard. Select your account in Account Home \u0026gt; Workers \u0026amp; Pages. Select your Pages project \u0026gt; Custom domains. Select Set up a domain. Provide the domain that you would like to serve your Cloudflare Pages site on and select Continue. Since I went the CNAME Route, this was super easy to set up. However, more configurations can be found here.\nCloudflare Web Analytics Optionally, I also added a web analytics service to my site. Cloudflare Web Analytics provides free, privacy-first analytics for your website without changing your DNS or using Cloudflare’s proxy. This helps you understand the performance of your web pages as experienced by your site visitors.\nCloudflare Pages offers a one-click setup for Web Analytics:\nLog in to Cloudflare dashboard. From Account Home, select Workers \u0026amp; Pages. In Overview, select your Pages project. Go to Manage \u0026gt; Web Analytics \u0026gt; and select Enable Web Analytics. Cloudflare will automatically add the JavaScript snippet to your Pages site on the next deployment. Conclusion While this is just a high-level overview and will constantly evolve as I learn more about how I want my site to look and feel, the basics are here, continuous deployment, comments, and analytics and of course a custom domain.\nOverall, I am very happy with how the site turned out and how easy it was to get the site up and running on Cloudflare pages.\nI hope this article has helped you get your site up and running. If you have any questions or need help, feel free to reach out to me in the comments. I am always happy to help!\n","permalink":"/p/hugo-website/","date":"May 07, 2024"},{"title":"Volume Shadow Copy Service (VSS)","description":"How VSS and the SQL Writer Service interact with SQL Server backups, including supported operations, limitations around point-in-time recovery, and common troubleshooting steps.","content":"What Is VSS? Volume Shadow Copy Service (VSS) is a set of Microsoft APIs. It allows users to perform backups or snapshots on files and volumes even when they are in use. VSS provides a consistent interface that allows coordination between user applications that update data on disk (writers) and backup applications (requestors).\nHow It Works Gather the writer metadata and prepare for shadow copy creation. Each writer creates an XML description of what is getting backed up. These XMLs are provided to the VSS. The writer also defines a restore method for all components. The VSS provides the writer\u0026rsquo;s description to the requester, which then selects the components to back up. The VSS notifies all the writers to prepare their data for making a shadow copy. Each writer prepares the data as appropriate. When finished, the writer notifies the VSS. The VSS tells the writers to freeze application write I/O requests. Read I/O requests are still possible. The application freeze is not allowed to take longer than 60 seconds. The VSS flushes the file system buffers and then freezes the file system, ensuring that the file system metadata is logged in a consistent order. The VSS tells the provider to create the shadow copy. This period lasts no more than 10 seconds, during which all write I/O requests to the file system remain frozen. The VSS releases file system write I/O requests. VSS tells the writers to thaw application write I/O requests. At this point applications are free to resume writing data to the disk. NOTE: The shadow copy creation can abort if the writers stay frozen for longer than 60 seconds or if the providers take longer than 10 seconds to commit the shadow copy. The requester can retry the process or notify the administrator to retry at a later time.\nUpon creation, the VSS returns location information for the shadow copy to the requester. More detailed information is available in the Microsoft VSS documentation.\nSQL Writer Service When it relates to SQL Server, most of the time any errors on backups are not caused by SQL. But when they are, it is likely that the SQL Writer Service is not enabled. The service installs without user intervention when installing SQL Server and provides added functionality for backup and restore of SQL Server through VSS.\nSQL Writer Supports Full database backup and restore including full-text catalogs Differential backup and restore Restore with move Database rename Copy-only backup Auto-recovery of database snapshot SQL Writer Does NOT Support Log backups File and filegroup backup Page restore Limitations and Problems Point-in-time recovery \u0026ndash; It is possible you will not meet your RPO. Usually you can only recover to the last point of your backup, not a specific time. Transaction log clearing \u0026ndash; VSS backups will not clear the SQL log file. This needs to happen through the native SQL commands. Pauses in I/O \u0026ndash; During a backup, it is possible that database I/O will pause for just under 70 seconds. This can lead to users complaining about performance issues or disconnections. Common Troubleshooting A few things to check when getting errors:\nThe SQL Writer service needs to be running at the time of the backup The service needs to be running as the Local System account The NT Service\\SQLWriter login needs to be active inside SQL Server (this account is designated as no login, which limits vulnerability) Next Steps If none of these are the cause of your problem, reach out to your backup solution vendor. Include the following in the email:\nThat you checked the three steps listed above The Event Viewer errors or warnings A detailed description of what you are experiencing How often the problem occurs ","permalink":"/p/vss-backups/","date":"Jan 15, 2021"},{"title":"Vertical Partitioning","description":"How vertical partitioning (row splitting) reduces page waste and improves query performance in SQL Server, with a one-million-row demo.","content":"Introduction Data density is something that was hinted at in the last three parts of this series, however, it wasn\u0026rsquo;t totally discussed. Data density put simply is how many rows you get per page and how much space is left over on each page. It was discussed that a page can only contain up to 8060 bytes. This means that if a row is fixed width for 5000 bytes in a row, you could only have 1 row per page and essentially 3060 bytes wasted on every page. This can result in a huge waste of space in the data files.\nA good practice to avoid wasting space is using vertical partitioning. This practice has 2 main categories:\nNormalization Row splitting For the focus of this article we will be focusing on Row splitting. Put simply, this is the process of dividing out columns that are not used often OR that are large columns that could potentially be stored into another table.\nSince these values are stored in another table it does make the queries a bit more complicated to write as you are slightly de-normalizing your data especially if this is used often and is just a large value. However, if the column is not used that often this can be very useful not only for storage but also for query optimization.\nDemo Setup To start off, we are going to demo a table that has not been vertically partitioned.\n1 2 3 4 5 6 7 CREATE TABLE [NOVerticalPartitioning] ( [Column1] INT IDENTITY, [Column2] VARCHAR (100), [Column3] VARCHAR (20), [Column4] VARCHAR (1000), ) From here we will insert 1 million rows, I believe this to be a realistic size for a person or product table.\n1 2 3 4 5 6 7 8 9 10 11 12 13 INSERT INTO [NOVerticalPartitioning] ( [Column2], [Column3], [Column4] ) VALUES ( REPLICATE (\u0026#39;2\u0026#39;, 50), REPLICATE(\u0026#39;3\u0026#39;,20), REPLICATE (\u0026#39;4\u0026#39;, 1000) ) GO 1000000 Now before we analyze this data, lets see how it would look if we move the large VARCHAR (1000) column that is always populated off to another table.\n1 2 3 4 5 6 7 8 9 10 11 12 CREATE TABLE [WITHVerticalPartitioning1] ( [Column1] INT IDENTITY PRIMARY KEY, [Column2] VARCHAR (100), [Column3] VARCHAR (20) ) CREATE TABLE [WITHVerticalPartitioning2] ( [Column1] INT IDENTITY FOREIGN KEY REFERENCES [WITHVerticalPartitioning1]([Column1]), [Column4] VARCHAR (1000) ) AND lets also insert the 1 million rows same as previously inserted.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 BEGIN TRANSACTION INSERT INTO [WITHVerticalPartitioning1] ( [Column2], [Column3] ) VALUES ( REPLICATE (\u0026#39;2\u0026#39;, 50), REPLICATE(\u0026#39;3\u0026#39;,20) ) INSERT INTO [WITHVerticalPartitioning2] ( [Column4] ) VALUES ( REPLICATE (\u0026#39;4\u0026#39;, 1000) ) GO 1000000 COMMIT TRANSACTION Now lets used the same query from the previous post to view the row structure of the table with no vertical partitioning.\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;NOVerticalPartitioning\u0026#39;) , NULL , NULL , N\u0026#39;sampled\u0026#39;) Note that the row count is 1000200. This is due to the fact that we only took a \u0026ldquo;sampled\u0026rdquo; view of the pages, we will need to view the \u0026ldquo;detailed\u0026rdquo; in order to get an exact count. As previously noted, since we have more then 10,000 pages, the sampled view will not to a detailed view automatically.\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;NOVerticalPartitioning\u0026#39;) , NULL , NULL , N\u0026#39;detailed\u0026#39;) There we go, exactly 1000000 rows.\nNow lets see how our performance looks, but first to make sure we are starting from a clean state lets clear our cache using the following script.\n1 2 3 4 CHECKPOINT; GO DBCC DROPCLEANBUFFERS; GO Now lets run a query to return only the even rows that we had inserted based off our primary key value. However, prior to running the following query lets also turn on our \u0026ldquo;STATISTICS IO ON\u0026rdquo; this helps you to understand how your query performed shows you what actually happened.\n1 2 3 4 5 6 SET STATISTICS IO ON GO SELECT * FROM [NOVerticalPartitioning] WHERE Column1 % 2 = 0 GO SET STATISTICS IO OFF Once we click on our messages tab we will see the following:\n(500000 row(s) affected) Table \u0026lsquo;NOVerticalPartitioning\u0026rsquo;. Scan count 1, logical reads 166674, physical reads 0, read-ahead reads 166473, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\nNow for the Vertically partitioned tables, lets first look at the row structures.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;WITHVerticalPartitioning1\u0026#39;) , NULL , NULL , N\u0026#39;detailed\u0026#39;) SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;WITHVerticalPartitioning2\u0026#39;) , NULL , NULL , N\u0026#39;detailed\u0026#39;) Finally lets see how our query performs.\n1 2 3 4 5 6 7 8 SET STATISTICS IO ON GO SELECT * FROM [WITHVerticalPartitioning1] VP1 inner join [WITHVerticalPartitioning2] VP2 ON vp1.Column1=vp2.Column1 WHERE vp1.Column1 % 2 = 0 GO SET STATISTICS IO OFF GO Results Comparison Now lets compare what we have gathered thus far:\nMetric Vertically Partitioned NOT Vertically Partitioned Total pages 154018 166674 Selecting all columns - Scan Count 10 1 Selecting all columns - Logical Reads 193764 166674 Selecting all columns - Physical Reads 4659 0 Selecting all columns - Read-Ahead Reads 193764 166674 Well it looks like we have now saved 12,656 pages, which is excellent! It saved us just over 101MB, however, if we were always trying to return the last column, this would definitely not be worth it for performance reasons. However, if we weren\u0026rsquo;t always selecting all of the columns, what would that look like? Would that make the difference for us and make it worth our time?\nTo find out, lets run the following query and analyze the out puts.\n1 2 3 4 5 6 7 8 9 SET STATISTICS IO ON GO SELECT Column2,Column3 FROM [WITHVerticalPartitioning1] VP1 WHERE vp1.Column1 % 2 = 0 SELECT Column2,Column3 FROM [NOVerticalPartitioning] WHERE Column1 % 2 = 0 GO SET STATISTICS IO OFF GO Metric Vertically Partitioned NOT Vertically Partitioned Selecting Column2 and Column3 - Scan Count 1 1 Selecting Column2 and Column3 - Logical Reads 11154 166674 Selecting Column2 and Column3 - Physical Reads 3 0 Selecting Column2 and Column3 - Read-Ahead Reads 11150 166674 It made a huge difference for performance, we now no longer had to look at that third column and increased our performance by almost 15 times.\nConclusion In summary, vertical partitioning is a good design practice and should be used, however, it does depend on your workload. Do not throw columns to another table just because they are large objects. Instead, take the time to evaluate your data and make the right choice when it comes to what table the column should be under.\n","permalink":"/p/vertical-partitioning/","date":"Jan 14, 2021"},{"title":"Understanding Storage in SQL Server","description":"How SQL Server organizes data at the page and extent level, and why LOB, in-row, and row-overflow storage matter for performance and design.","content":"At this point, I want to take a moment to help you understand the database storage a bit more. I am not talking about RAIDs, SANs, or having the transaction log on a different drive either. What I mean is to talk about how the database actually stores the attributes (columns) and data records (rows).\nPages and Extents To start off, I would like to dig a bit into pages and extents. The basic unit of data storage in SQL Server is the page. Disk I/O operations are performed at the page level, meaning that SQL Server must read or write whole pages. Pages make up 8 KB of data, which allows 128 pages for each MB allocated to SQL Server’s data files (MDF and LDF).\nBelow is an outline of a page with how it is made up.\n[Diagram: SQL Server page structure — 96-byte header, data rows, 2-byte slot array]\nExtents Extents are a collection of eight physically sequential pages and are used to efficiently manage the pages. All pages are stored in extents. However, to make space allocation efficient, SQL Server does not allocate whole extents to tables at one time. This creates two different types of extents: mixed and uniform.\nUntil objects (tables, indexes, etc.) have enough pages to create a uniform extent, they are a mixed extent. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations.\nRow Data Structures Zooming in further, a row can have three types of data structures:\nLOB_DATA — Stored off-page when the data in the LOB column exceeds 8,060 bytes. This is only possible through certain data types: VARCHAR(MAX) NVARCHAR(MAX) VARBINARY(MAX) XML CLR user-defined types IN_ROW_DATA — Every row has a portion of its data stored in this structure, if not all of it. Cannot span pages All fixed-width columns must be stored in this structure Limited to 8,060 bytes due to the header (96 bytes), slot array (2 bytes), and reserved space for Microsoft (34 bytes) This means you cannot create 10 CHAR(1000) columns in the table, since these are fixed width However, you can create 10 VARCHAR(1000) since they are variable width. But note that if you exceed 8,060 bytes in the table you will overflow into the structure below. ROW_OVERFLOW_DATA — This only happens with certain data types: VARCHAR(x) NVARCHAR(x) VARBINARY(x) SQL_VARIANT When this happens, it can cause two major issues: Poor performance — If the column overflows, you need an extra I/O to retrieve the column Data integrity issues — For example, a VARCHAR(2000) column for an address allows users to input garbage data Wrapping Up Now that we got through this, I hope it gives you some basic insight into the data structures. In the following post, we will go over why this was important for data density reasons and other performance concerns.\n","permalink":"/p/understanding-storage/","date":"Jan 13, 2021"},{"title":"Transaction Log Management","description":"A practical Q\u0026A on SQL Server transaction log internals: how transactions work, crash recovery, why logs grow, and the correct way to shrink and resize the log file.","content":"I want to discuss some of the confusion around the transaction log. The transaction log is one of the most important things to understand in SQL Server. Especially when referring to high availability or disaster recovery. In these features, SQL uses the transaction log as a key component. After all, without your transaction log your database is unusable.\nWhat Is a Transaction? A transaction is a single unit of work that complies with A.C.I.D. standards:\nAtomic \u0026ndash; It either all works or none of it does Consistent \u0026ndash; It complies with the \u0026ldquo;Rules\u0026rdquo;, meaning constraints, triggers, datatype adherence, etc. Isolated \u0026ndash; It will not affect other transactions that are running at the same time Durable \u0026ndash; Changes are permanent and will survive a power outage, crash, and/or memory dump Additionally, SQL Server uses implicit transactions. This means that it will apply the begin and commit/rollback for you if not specified. The line shown below is a single transaction:\n1 DELETE FROM person WHERE lastname = \u0026#39;Billy\u0026#39; Whereas this is two transactions:\n1 2 DELETE FROM person WHERE lastname = \u0026#39;Billy\u0026#39; DELETE FROM person WHERE firstname = \u0026#39;Timmy\u0026#39; To make this durable you need to wrap it with a BEGIN TRANSACTION and end with COMMIT or ROLLBACK. Otherwise known as an explicit transaction.\nTransaction Log Operation To outline the transaction log operation:\nUser executes a transaction Pages that are going to change go to the SQL cache on RAM The log file records the details of the transaction and assigns the transaction a Log Sequence Number (LSN) After storing the intent of the transaction, SQL Server then modifies the pages on RAM The pages are, at some point, written back to the disk What Happens During a Crash? If your server crashes, it is the responsibility of the log file to help you recover. The dirty pages (pages that were in RAM that did not make it to disk) are now lost. But the transaction log contains a full description of the intent of each transaction.\nUpon restart, SQL begins to check integrity and consistency utilizing the log. To do this, it compares the LSNs in the log to those on disk. If the LSN in the log is newer it updates the file on the disk, known as the REDO phase. If the LSN in the log is older, it then knows it needs to rollback, known as the UNDO phase. These make sure that when the database comes online, it is consistent.\nWhy Does My Log File Keep Growing? To understand this, you need to first understand how the log file handles information. The transaction log file is a circular file. This means that it has a set amount of space and continues to create new log records sequentially until the end. Once it reaches the end of the file SQL Server faces two options:\nCircle back to the beginning and override older log files Grow the bounds of the log file and continue to consume space With Option 2, it requires extra space and needs the overhead to grow the file. When the log file grows, it does so by creating new Virtual Log Files (VLFs) within the log. You can consider these as groups of transactions within the log. Thus using the transaction log with Option 1 is preferable.\nSo since this is preferable, why does SQL keep growing? SQL Server needs to keep log records around until no purpose remains for their storage. These purposes vary, but the main reason is the records are critical to the REDO and UNDO phase. SQL does this by marking them as \u0026ldquo;Active\u0026rdquo;. This means that SQL is not allowed to reuse the space in the log until all the transactions are \u0026ldquo;Inactive\u0026rdquo; in the VLF. Then and ONLY then can SQL Server reuse the space allocated to the VLF.\nSo in short, your SQL Server is leaving the transactions as active. But don\u0026rsquo;t freak out, this is normal for the FULL recovery model. In this model you need to take transaction log backups to fix this.\nUnfortunately, this conversation usually doesn\u0026rsquo;t happen right away. This comes up when the log file is using a massive amount of space on the drive.\nHow Do I Manage the Log File? There are many different articles about this, but they do not always have the best/correct answer. Most people just recommend to either:\nSwitching the database to Simple, shrink the file, switch back to Full Truncating the log with NO_LOG / TRUNCATE_ONLY Both of these solutions are harmful. Luckily, starting in 2008, option two is no longer possible. Yet, option 1 is still terrible. When you look at what we talked about earlier, I hope this makes you cringe.\nWhy These Approaches Are Wrong When you shrink the file this way you completely discard everything in the log file. That means all the work since the last backup is gone if your database crashes or fails for whatever reason. This violates the whole point of the FULL/BULK_LOGGED recovery models. The point of this model is to preserve the transaction log so a database can recover to a specific time.\nBesides, if you shrink the file, it will grow again. This growth is likely to create log fragmentation in the VLFs. This is a performance concern much like disk fragmentation. Though it is not as much of a problem in SQL 2014 and above, it should still be on your mind. Paul Randal has a good article about the creation of VLFs.\nAdditionally, this will pause your workload while the log file is growing.\nAcceptable Solutions There are only two solutions that are acceptable in my opinion:\nRun in Simple mode \u0026ndash; If you don\u0026rsquo;t mind the possibility of losing data in the event of a disaster, this is a good option. Run regular log backups \u0026ndash; Especially if you are looking for point-in-time recovery. How Do I Get It Back to a Reasonable Size? It is always tricky to give the \u0026ldquo;reasonable\u0026rdquo; size answer. The truth is that it varies based on your workload and size of your database. I would always suggest the following:\nIf it is your company\u0026rsquo;s proprietary database, speak to the developers. They should have a staging environment where you can track the log growth. If you are using another company\u0026rsquo;s database, reach out to their support. They might have an answer but they are more likely to give you the \u0026ldquo;it depends\u0026rdquo; response. Guess \u0026ndash; estimates range from 20-25% of the MDF to 1-2% on multi-terabyte databases. Resizing the Log File To resize the log file I always use the following steps:\nStep 1: Wait for an inactive time of day. It would be best to put the database into single user mode first but it is not required.\n1 2 3 ALTER DATABASE databasename SET SINGLE_USER WITH ROLLBACK IMMEDIATE; Step 2: Run a transaction log backup if you are in full recovery mode. Or if you\u0026rsquo;re using simple recovery, just clear the transaction log by running a checkpoint.\n1 2 BACKUP LOG databasename TO DISK = \u0026#39;File path and extension (.TRN)\u0026#39;; Step 3: Shrink the log to the smallest possible amount.\n1 DBCC SHRINKFILE(TlogLogicalFileName, TRUNCATEONLY); You will need the filename of the log to complete this step. Run the following script if you do not know it:\n1 2 3 4 5 6 7 EXEC SP_HELP --or SELECT DB_NAME() AS DbName, name AS FileName, size/128.0 AS CurrentSizeMB, size/128.0 - CAST(FILEPROPERTY(name, \u0026#39;SpaceUsed\u0026#39;) AS INT)/128.0 AS FreeSpaceMB FROM sys.database_files; Step 4: Alter the database to change the transaction log file size.\n1 2 3 4 5 6 ALTER DATABASE [databasename] MODIFY FILE ( NAME = [TlogLogicalFileName], SIZE = NewSize in MB ); Step 5: If you set it to single user, change it back.\n1 2 ALTER DATABASE databasename SET MULTI_USER; NOTE: SQLSkills suggests growing the logs in 8 GB chunks using 8000MB as the NewSize variable. This creates VLFs that are 512 MB. These smaller chunks make it easier to maintain the smaller log file size.\n","permalink":"/p/transaction-log-management/","date":"Jan 12, 2021"},{"title":"SSMS Productivity Tips","description":"Eight practical tips for working faster in SQL Server Management Studio, from block selection and split windows to color-coded connections.","content":"Like any tool that I use, I try to use it to its full potential. Yet, I have found that most people using SSMS don\u0026rsquo;t know most of the features of the IDE. Because of this, I will review some SSMS features and tips to boost productivity and code faster.\n1. Inserting a Comma-Separated List Every select statement needs a comma separated list of columns you would like to return. But, this can be a pain to type especially for many columns.\nThe trick is not to write them out to begin with. From Object Explorer, drag the \u0026ldquo;Columns\u0026rdquo; item and drop it onto a query window. This generates a list in your query window. Then you only need to add the surrounding query text.\n2. Selecting a Block of Code Now that you have all the columns from your tables, you may need to remove the references to them. This can be a tedious process depending on how many columns you have. The shortcut is to hold ALT+Shift and drag your cursor (or use arrow keys) over the text you want to delete. This allows you to highlight a rectangular block for easy deletion.\n[Screenshot: ALT+Shift block selection example]\n3. Inserting a Block of Code There is another benefit to the ALT+Shift shortcut: you can use this method to insert text on many rows in the same area. This makes it easy to create comma delimited lists or prefix values. A great demonstration of this is on Brent Ozar\u0026rsquo;s website.\n4. Adding Line Numbers Line numbers are handy when debugging code. It is the reason code editors tend to have them on by default. But, this is not the case with Management Studio. To add these select Tools -\u0026gt; Options. In the dialog box that appears click on Text Editor and then Transact-SQL. In the display section on the left, select the check box for Line numbers.\n5. Color Coding Connections Many of us tend to work on many servers at once. This can lead to some confusion when trying to execute queries on a specific server. SSMS has the ability to color code specific connections. This displays at the bottom of the query window.\nFor example, green for test or red for production.\nWhen connecting, select the Options button in the \u0026ldquo;Connect to Database Engine\u0026rdquo; window. Then select the Connection Properties tab. Select the check box towards the bottom of the window and use the \u0026ldquo;Select\u0026hellip;\u0026rdquo; button to choose a color.\n6. Moving Columns Without Changing Code It may not be obvious but it\u0026rsquo;s possible to move columns inside the results pane. To do this, start by selecting the column you want to move and then dragging it to your desired location. This relocates that column to the new location much like using Excel. This is a huge benefit to productivity, especially if your query took a long time to run.\n7. Splitting the Query Window When working on long queries it can be difficult to analyze what is happening. In SSMS you are able to split the query window so that it will show your code twice with independent scroll bars.\nFirst select the split icon in the top right hand corner of your query window.\n[Screenshot: Split window icon location]\nNow drag this icon down to create the second window. Once you have your second window at a reasonable size, it will have its own scroll bar.\n8. Finding Error Lines When writing new queries, you will often encounter errors. These errors are often misleading in long code. The shortest way to find the error is to double click on the error message. This highlights the line that threw the error.\n","permalink":"/p/ssms-productivity/","date":"Jan 11, 2021"},{"title":"SQL Server Corruption Post Mortem","description":"A post-mortem guide for SQL Server database corruption: likely causes from I/O subsystem errors to firmware, early detection strategies, and recovery steps.","content":"\u0026ldquo;No matter what method you use to recover from corruption, you should always determine why it happened to avoid future problems.\u0026rdquo; \u0026ndash; Paul Randal\nMost Likely Causes The most likely causes are outlined below, in order of likelihood:\nRun I/O subsystem and server memory diagnostics (almost always the cause) Windows OS File system filter drivers \u0026ndash; e.g. antivirus, defraggers, encryption Network cards, switches, cables Memory corruption Bad memory chips Scribblers SAN controllers RAID controllers Disks Examine the Logs Check the SQL Server error log and Windows event logs for clues:\nError 823: A hard I/O error (when Windows can\u0026rsquo;t return the data to SQL) Error 824: A soft I/O error (when SQL detects there is a problem with the data it was given by Windows) Error 825: A read-retry error (these show as information alerts however they are critical because they are impending-doom warning signs) More information: Microsoft KB 2015757 Also:\nCheck that the firmware is up-to-date Investigate NTFS filter drivers What Does NOT Cause Corruption It is important to remember that corruptions are not caused by the following:\nAnything an application can do Anything you can do in SQL Server with supported, documented commands Interrupting a database shrink, index rebuild, or long-running batch Shutting down SQL Server Early Detection Strategies If corruption happened, it is likely that certain steps are not being done inside your organization to protect your data. Here is a list of what can be done to help detect early signs:\nImplement error alerts for Severity 19 and above errors \u0026ndash; see Configuring Alerts for High Severity Problems Implement Page Verify \u0026ndash; Checksum option \u0026ndash; see Setting Your Page Verify Database Option to Checksum Implement backups with Checksum option All databases INCLUDING system databases Allows you to use RESTORE VERIFYONLY with CHECKSUM to validate your backups Implement an integrity check script \u0026ndash; either Ola Hallengren\u0026rsquo;s Maintenance Solution (widely used) or at minimum a DBCC CHECKDB WITH NO_INFOMSGS job to check integrity ","permalink":"/p/sql-corruption-post-mortem/","date":"Jan 10, 2021"},{"title":"Row Structure in SQL Server","description":"Understanding SQL Server row structure, including fixed and variable length columns, null bitmaps, and how trailing nulls affect storage efficiency.","content":"To completely understand how efficiently your database is using space/pages, it is important to understand the row structure of the database.\nCreating the Test Table Let’s create a table as shown below:\n1 2 3 4 5 6 7 8 9 10 CREATE TABLE [TestingColumnSize] ( [Column1] INT IDENTITY, [Column2] VARCHAR (1000) DEFAULT ‘HELLO’, [Column3] VARCHAR (1000) DEFAULT ‘SQL’, [Column4] VARCHAR (1000) DEFAULT ‘FOLLOWERS’, [Column5] VARCHAR (1000) DEFAULT ‘HOPE’, [Column6] VARCHAR (1000) DEFAULT ‘THIS’, [Column7] VARCHAR (1000) DEFAULT ‘HELPS’ ) Inserting Data From here we will insert default values 4 times using the following script:\n1 2 INSERT INTO TestingColumnSize DEFAULT VALUES GO 4 [Screenshot: query results showing the four inserted rows with default values]\nExamining Row Size We now have our table populated with 4 rows, this will allow us to query the table to see how our table is storing this information on the pages. Using the below query you will be able to see how many rows we have and what the max and min rows are using for bytes.\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N’TestingColumnSize’) , NULL , NULL , N’Sampled’) Additional information \u0026ndash; The below query can be run in a few different modes which can be read about here. However, for the purposes of this demo we are using Sampled mode. Since the heap has fewer than 10,000 pages, DETAILED mode is automatically used.\nCurrently, our table looks like this:\n[Screenshot: query results showing 4 rows at 55 bytes each]\nUnderstanding the Row Structure However, if you are following along you might ask how we have 55 bytes per row? It doesn’t seem to make sense at first glance since we only used 34 bytes in the default values. (4 on the identity int and 30 for the characters in varchar defaults.) To explain this, you need to understand the structure of the row, below I have created an image showing you what each column looks like when the data structure is in_row_data.\n[Screenshot: diagram of in_row_data column structure]\nEvery column in this data structure is set up this way:\nRow header \u0026ndash; consists of a tag and null bitmap (outside the scope of this tutorial) but it makes up 4 bytes. Fixed length columns \u0026ndash; This would be our integer, making up another 4 bytes. Null bitmap \u0026ndash; used to optimize storage, allows you not to store “Null” in all columns that are null (outside the scope of this tutorial). Uses 2 bytes PLUS an additional 1 bit per variable length column. 3 bytes. Variable-width column offset array \u0026ndash; allows SQL to quickly find the end of the variable length column by storing the end value. 2 bytes plus every column value EXCEPT when there are trailing nulls. 2 bytes plus (2 * 6 bytes for our columns) = 14 bytes. Variable width columns \u0026ndash; The values of the variable width columns. 30 bytes. This gives us our 55 bytes.\nTrailing Nulls and Storage Savings Now, I mentioned in the variable-width column offset array, that if the row had trailing nulls there was a difference. Let’s demo this.\nIn our table we created above, we will insert a new row.\n1 2 3 4 INSERT INTO TestingColumnSize (Column2,Column3,Column4,Column5,Column6,Column7) VALUES (‘inserting30’,’bytes of data’,’in row’,null,null,null) Now let\u0026rsquo;s run our query above to find out row size.\n[Screenshot: query results showing the new row at 49 bytes]\nWe now have 49 bytes for one row thanks to the trailing nulls. This was caused by the variable-width column offset array, it did not have to dedicate 2 bytes to the null columns. However, what happens if we insert a value at the end of the row.\n1 2 3 4 INSERT INTO TestingColumnSize (Column2,Column3,Column4,Column5,Column6,Column7) VALUES (‘inserting30’,’bytes of data’,’in row’,null,null,’1’) Again, running the query above to examine the rows, we find that the offset array needed to store the 6 bytes again.\n[Screenshot: query results showing the row with a non-null trailing column back at 55 bytes]\nColumn Ordering Matters This can lead to a terrible waste of space if you have columns that are often null stored in the beginning of the table instead of the end. So if you have nullable data, always make sure that you order it in the way where the column most likely to be null is at the end, NEVER out a defaulted value at the end of a row, it will cost you much more space than you think.\n","permalink":"/p/row-structure/","date":"Jan 09, 2021"},{"title":"Recovery Model Behavior","description":"A comparison of SQL Server's Full, Simple, and Bulk-Logged recovery models, covering transaction log behavior, backup requirements, and disaster recovery implications.","content":"Full Recovery Model This recovery is the most common recovery model used for SQL Server. This model is required for database mirroring and availability groups.\nAllows log backups and zero-to-minimal data loss after a disaster All changes to the database are fully logged to the transaction log file The transaction log will not clear data until a log backup has been performed, meaning that the log backup is what clears the transaction log This means that log backups are required, otherwise the transaction log will continually grow until your drive is full Simple Recovery Model Commonly used when log backups and point-in-time recovery are not required.\nSome operations can use minimal logging All other operations are fully logged, as in the FULL recovery model The transaction log will not clear data until a checkpoint has been performed Log backups are NOT possible with this recovery model Important: Disaster recovery is only possible using full and differential backups, so data loss will occur back to the most recent data backup. Simple recovery is not an appropriate choice for production systems where loss of recent changes is unacceptable.\nBulk-Logged Recovery Model Commonly used to minimize log growth during bulk operations, while preserving the ability to take log backups.\nSome operations can use minimal logging All other operations are fully logged, as in the FULL recovery model The transaction log will not clear data until a log backup has been performed, meaning that the log backup is what clears the transaction log This means that log backups are required, otherwise the transaction log will continually grow until your drive is full The time running in this mode should be minimized to reduce potential loss of data Important: The Bulk-Logged recovery model should not be used if there is the possibility of losing user transactions.\nMicrosoft has a high level overview of recovery model differences.\n","permalink":"/p/recovery-model-behavior/","date":"Jan 08, 2021"},{"title":"SQL Server Maintenance Tasks","description":"Essential SQL Server maintenance tasks for accidental DBAs: backup strategies, consistency checks with DBCC CHECKDB, index rebuilds, and understanding RPO/RTO.","content":"Every day in the IT field people are thrown to being the accidental DBA. While this is a mistake for most companies to make it happens, especially with the change in the healthcare field and the adoption of Electronic Medical Record (EMR) software. I wanted to make sure to get some knowledge out there on the subject of how to maintain a database and avoid a Resume Producing Event (RPE).\nBefore getting started on the maintenance it is important to understand and discuss your Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).\n[Screenshot: RTO \u0026amp; RPO diagram by Paul Randal]\nCore Maintenance Tasks \u0026ndash; SQL Server Jobs Backups One of the key things to remember is that backups should be ideally performed to a local disk for performance AND then copied to another location for storage. Far too often these are found on the same disk in my experience and is a terrible idea for disaster recovery.\nRecommended (mission critical):\nFull database backup once per week Nightly differential Transaction logs every 15 minutes Remember FULL recovery mode for mission critical databases where data loss is not acceptable as this allows for point-in-time recovery and log backups. For more information feel free to check out the previous post on recovery models.\nConsistency Checks It is usually assumed that SQL Server does consistency checks and identifies database corruption immediately, but in reality, SQL Server does not notice the corruption until the corrupt page is accessed from disk again. This could be weeks or even months away if you are unlucky and at that point you will likely not have the backups prior to the corruption.\nRecommended (based on maintenance windows):\nRun at least one time a week NOTE: To remove the dependency for a maintenance window it is possible to restore a database to another server and run a system integrity check on the database from there. If corruption is found, you would have to identify if it is your backup or if it is in your live/production server. This process would then have to call a maintenance window in some circumstances.\nIndex Maintenance Rebuilding your indexes is a key part of maintaining performance in a database. This can generate a high amount of log information and high amount of I/Os on the environment as all the tables are updated with a full table scan.\nRecommended:\nRun one time a week As a note, there is an option to \u0026ldquo;Reorganize\u0026rdquo; indexes. This should NOT be scheduled at the same time as the Rebuild. This task reorganizes the indexes but does not include an update of the statistics. In order to complete the same thing as a rebuild you would also need to run an update statistics task. Still, reorganizing indexes is a fully online operation and can be a good interim step between regular executions of the rebuild index task.\nRecommended:\nRun one time a week \u0026ndash; halfway between your executions of the Rebuild For additional information, I would always recommend looking into the series from SQLSkills.com Accidental DBA.\n","permalink":"/p/maintenance-tasks/","date":"Jan 07, 2021"},{"title":"Implicit Conversion in SQL Server","description":"How to audit your database for implicit conversions, identify data type mismatches across columns, and find conversion warnings in the plan cache.","content":"Audit Your Data Types The following query shows a count of each data type in use across all user tables. This is a useful starting point for spotting mismatches that lead to implicit conversions.\n1 2 3 4 5 6 7 8 SELECT t.name AS TypeName, COUNT(t.name) AS CountOfType FROM sys.columns AS c JOIN sys.types AS t ON c.user_type_id = t.user_type_id JOIN sys.tables AS tab ON tab.object_id = c.object_id WHERE tab.is_ms_shipped = 0 GROUP BY t.name Further Reading Exploring the Plan Cache \u0026ndash; Warnings \u0026ndash; finding conversion warnings inside cached query plans Finding Implicit Column Conversions in the Plan Cache \u0026ndash; Jonathan Kehayias\u0026rsquo;s approach to mining the plan cache for implicit conversions Implicit Conversion Costs \u0026ndash; measuring the real performance impact Do SQL Server User Defined Datatypes (UDT) Affect Performance? \u0026ndash; whether UDTs introduce hidden conversion overhead ","permalink":"/p/implicit-conversion/","date":"Jan 06, 2021"},{"title":"Understanding HEAP Tables","description":"Why SQL Server heap tables cause performance problems through full table scans and forwarding rows, with a demo showing the I/O impact and how to fix them.","content":"Zooming out further, we should take a look at the table structures. By this, I am of course talking about HEAPS and Clustered Tables. Though, the main focus of the post will be Heaps, it is hard to talk about them without the other.\nThe official definition for a heap \u0026ldquo;\u0026hellip;is a table without a clustered index.\u0026rdquo; This has always been a wonky definition to me. Who defines something by saying it\u0026rsquo;s not something else? But, the reason they did it is because it is correct. This is the only difference between a heap and a clustered table.\nTo explain what a heap IS though, I find it easier to list the properties:\nMade by using CREATE TABLE syntax without a clustering key CAN have a primary key - with a non clustered index Can have non-clustered indexes So in reality, a heap is a table that does not store data in a specific order. Because of this, these heap tables can affect performance in two primary ways:\nFull table scans Forwarding rows Full Table Scans To examine any row in the table heaps have to perform a full table scan. This in itself should tell you that this will cause problems. These scans are S\u0026hellip;L\u0026hellip;O\u0026hellip;W\u0026hellip; due to the heavy amount of I/O reads required. This also causes a high amount of disk to memory swapping caused by SQL loading the table into memory.\nForwarding Rows These ONLY exist in a heap table structure. Once the record outgrows the page, SQL Server moves the data to another page. It then leaves a Forwarding pointer/row at the original location to point to the new location. This entry can cause I/O increases that you wouldn\u0026rsquo;t expect. How the I/O problems arise:\nThe query you are running uses a record locator. This points to a page and the DB engine has to go to that page. Upon arriving, the locator sees the forwarding pointer. It then goes to the new page to get the forwarded row. So why do we have them? The main reason is that they do have their place. There are specific use cases where you would want them. Microsoft does a decent job at explaining them on their article here.\nBut again, in most cases they are not appropriate and will cause many problems in production. This is why by default SQL server will default the primary key to be your clustering key. (Not always the best either, will discuss on the next post)\nFun Part So now that we have the definitions out of the way we can do the demo to see how bad they actually are.\nFirst, let\u0026rsquo;s create a heap table and insert 1000000 rows.\n1 2 3 4 5 6 7 8 9 10 CREATE TABLE dbo.HeapingHeap ( Id BIGINT IDENTITY NOT NULL, column1 VARCHAR(1024) DEFAULT REPLICATE(\u0026#39;A\u0026#39;,200) NOT NULL ); GO INSERT INTO HeapingHeap DEFAULT VALUES GO 1000000 So now that we have our table setup and we can see the data, let\u0026rsquo;s look at performance.\n1 2 3 4 5 6 7 8 9 10 CHECKPOINT; GO DBCC DROPCLEANBUFFERS; GO SET STATISTICS IO ON; SELECT * FROM HeapingHeap GO SET STATISTICS IO OFF; GO Here is what I received:\nTable \u0026lsquo;HeapingHeap\u0026rsquo;. Scan count 1, logical reads 28572, physical reads 0, read-ahead reads 28550, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\nLets also look at how many pages we have in this table with the query to analyze physical stats. With one key change, we need to add forwarding rows.\n1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] , [forwarded_record_count] AS [Forwarded Rows] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;HeapingHeap\u0026#39;) , NULL , NULL , N\u0026#39;detailed\u0026#39;) Here is what I received:\nThis makes sense, As the logical reads = the amount of pages. This is good, performance is what we would expect. But what happens if we change the data on the heap? Let\u0026rsquo;s update the table to have 50% of the rows now be a bigger column.\n1 2 3 4 UPDATE dbo.HeapingHeap SET column1=REPLICATE(\u0026#39;Z\u0026#39;,1000) WHERE Id % 2 = 0; GO Again, let\u0026rsquo;s run our query for performance.\n1 2 3 4 5 6 7 8 9 10 CHECKPOINT; GO DBCC DROPCLEANBUFFERS; GO SET STATISTICS IO ON; SELECT * FROM HeapingHeap GO SET STATISTICS IO OFF; GO We now receive a new number:\nTable \u0026lsquo;HeapingHeap\u0026rsquo;. Scan count 1, logical reads 485715, physical reads 7145, read-ahead reads 28543, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.\nThis is a huge performance issue, now we are hitting the disks harder. Yet, we are still returning the same count of rows. How did this happen? What caused it to change?\nThis was the act of the forwarding rows on the table. Let\u0026rsquo;s take a look at our physical table again.\n1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT [alloc_unit_type_desc] AS [Data Structure] , [page_count] AS [pages] , [record_count] AS [Rows] , [min_record_size_in_bytes] AS [min row] , [max_record_size_in_bytes] AS [Max Row] , [forwarded_record_count] AS [Forwarded Rows] FROM SYS.dm_db_index_physical_stats (DB_id() ,OBJECT_ID (N\u0026#39;HeapingHeap\u0026#39;) , NULL , NULL , N\u0026#39;detailed\u0026#39;) If you notice, we now have 400,000 new forwarding rows that did not exist before. This is where our extra I/Os are coming from. Remember, as mentioned earlier you are causing SQL to go and fetch each one of these rows.\nFor us, the simple way to look at predicting the I/Os of this query is to use this equation.\nLogical Reads = allocated pages + forwarded rows\nHow to fix this To fix heaps there are two things to do:\nAdd a clustered index so the table won\u0026rsquo;t be a HEAP. If you NEED the heap in your database, use ALTER TABLE tablename REBUILD to rebuild the table. NOTE: People stumble because there is no easy way to identify the table is a heap. The only thing you can check for is the lack of clustering index. The best way to tell is to run a script on the database.\n1 2 3 4 5 6 7 8 9 10 11 SELECT SCH.name AS [Schema Name] , OBJ.name AS [Table_Name] FROM sys.indexes INX JOIN sys.objects OBJ ON INX.object_id = OBJ.object_id JOIN sys.schemas SCH ON OBJ.schema_id = SCH.schema_id WHERE OBJ.is_ms_shipped = 0 --filters SQL objects AND INX.index_id = 0 --meaning it has none AND OBJ.type = \u0026#39;U\u0026#39;; --user table GO This gives us a view of all the tables inside your database that are heaps.\n","permalink":"/p/heaps/","date":"Jan 05, 2021"},{"title":"Database Design - Why Does It Matter","description":"Why data type selection matters for SQL Server performance, memory usage, backups, and long-term scalability, with concrete byte-savings examples.","content":"Why Does It Matter? As hinted in the previous post, many people do not take database design seriously when first implementing their database idea. These people mostly have the \u0026ldquo;Let\u0026rsquo;s just get it done, and we will deal with performance later\u0026hellip;\u0026rdquo; kind of mindset. This is definitely the wrong approach to database design. While the database may be fine for a short period or a small group of people, if you are truly looking for a scalable environment this can be a major problem.\nAnother excuse I see for my colleague is that most of the time, people are only building databases for the semester or a final project. This can also be the wrong mindset, for example, let\u0026rsquo;s say that database comes into play for another course or you actually would like to build a database for a real purpose, such as organizing your music. In this situation, you would also fall into the trap where people think that databases are for a short term purpose and not intended to last very long.\nThree Critical Pillars There are three things that are critical for building a reliable and robust database:\nKnow your data \u0026ndash; Helps you properly choose what data types to use. For example, if you need to record a date, do you need the time? The date data type is only 3 bytes The datetime2 data type can range from 6 to 8 bytes Know your workload \u0026ndash; This can help you plan what tables should store the data OLTP databases vs Reporting databases: Reporting databases tend to store more information in one table for faster reads as they do not need to insert data as often. While OLTP databases store less data in one table to facilitate a more balanced workload. Know how the platform works \u0026ndash; This is key; it makes it easier to design a database as you will not only know the syntax but you can understand the difference between a clustering key versus nonclustering keys/indexes. The Byte Savings Add Up A good example of why this matters is the 3 byte versus 6-8 byte argument I mentioned above. Most people would shrug off a 3 or 5 byte difference as it is so small, however, you are not only dealing with one row or in some situations one column.\nIn this scenario, let\u0026rsquo;s say that you have a table with 3 datetime2 columns that are defaulted to a precision of 7 (Datetime2(7)) and are always populated, however, you only need to store date which uses 3 bytes. In this case you would be saving 15 bytes per row.\nBytes Saved Count of Rows Space Saved 15 1,000,000 15 MB 15 10,000,000 150 MB 15 100,000,000 1.5 GB 15 1,000,000,000 15 GB So now you would respond with, \u0026ldquo;So? It\u0026rsquo;s just disk space?\u0026rdquo; That is not true, due to the way SQL Server handles pages and data it is now read into memory. So you now have bloat on your RAM due to inefficient data types. Not to mention, you have to include logs, database backups, replications, and high availability environments.\nHowever, this is not something you would want to choose too close to the mark. If you think you are going to approach the limit of a data type, such as a tinyint (0 to 255), make sure you use smallint (-2^15 (-32,768) to 2^15-1 (32,767)).\nThe point I am trying to make here is take your time in choosing the proper data types. These data types are very difficult to change in the future, especially once the code has been written.\n","permalink":"/p/db-design-part-2/","date":"Jan 04, 2021"},{"title":"Database Design Part 1","description":"An introduction to SQL Server database design covering data type selection, table structure, and why thoughtful design matters for long-term performance.","content":"Introduction When first starting in the software world many companies do not hire database administrators or data analysts. This often leads to poorly designed databases and inefficient databases. In this series, I will touch on Datatype choice, table structures, and how this affects your database performance.\nThe main reason, I wanted to start this post is for a friend who is struggling in his college database class, who has a final project for designing a database. However, the concepts are scalable and do apply to anyone who is building a database. I would also like to shout out to Kimberly Tripp at SQLSkills.com, her enthusiasm during her courses on pluralsight and blog have definitely helped me stay interested in SQL.\nTo start off, I would like to throw out one of Kimberly\u0026rsquo;s favorite quotes regarding database design. \u0026ldquo;Disk space is cheap, who cares about 4 bytes versus 16 bytes?\u0026rdquo;. While disk space has become cheaper, it doesn\u0026rsquo;t mean that we should always use a bigint when a tinyint will do the job. As the series continues, we will go through some of the reasons why this is generally a bad practice.\nAdditionally, as a final note in the introduction, I would like to mention again that this blog is for Microsoft SQL server and most of the information that is given will refer to microsoft specifically, and while the theories will transfer in practice it may not be the best solution for the other RDBMS (Relational DataBase Management Systems).\n","permalink":"/p/db-design-part-1/","date":"Jan 03, 2021"},{"title":"Database Table Size","description":"A reusable SQL query to report row counts, total space, used space, and unused space per table and schema using sys.allocation_units.","content":"The following query reports row counts and space usage for every user table in a database. It joins sys.tables, sys.indexes, sys.partitions, and sys.allocation_units to produce totals in both KB and MB.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 SELECT t.NAME AS TableName, s.Name AS SchemaName, p.rows AS RowCounts, SUM(a.total_pages) * 8 AS TotalSpaceKB, CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS TotalSpaceMB, SUM(a.used_pages) * 8 AS UsedSpaceKB, CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB, (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB, CAST(ROUND(((SUM(a.total_pages) - SUM(a.used_pages)) * 8) / 1024.00, 2) AS NUMERIC(36, 2)) AS UnusedSpaceMB FROM sys.tables t INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id LEFT OUTER JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE t.NAME NOT LIKE \u0026#39;dt%\u0026#39; AND t.is_ms_shipped = 0 AND i.OBJECT_ID \u0026gt; 255 GROUP BY t.Name, s.Name, p.Rows ORDER BY t.Name ","permalink":"/p/database-table-size/","date":"Jan 02, 2021"},{"title":"Moving to SQL Server 2014/2016","description":"Key considerations when migrating to SQL Server 2014 or 2016, including the new cardinality estimator, edition changes, and version downgrades.","content":"As people are coming to end of life on their hardware they are migrating to later versions of SQL Server. This means that 2016 and 2014 are coming into vision for many companies. But when the upgrade date comes, there are a few things to be aware of.\nNew Query Optimizer \u0026ndash; Cardinality Estimator SQL Server 2014 introduced a new cardinality estimator that changes how the query optimizer estimates row counts. This can significantly affect query plan choices.\nSQL 2014 Cardinality Estimator Eats Bad TSQL for Breakfast \u0026ndash; post about the benefits of the new cardinality estimator in 2014 Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator \u0026ndash; covers the changes you would want to make and how to do them, including how to roll back to the old estimator if needed After testing database workloads, you may choose to revert to the legacy CE behavior for sessions connecting to a database. You can do so by changing the database compatibility level to a level lower than 120:\n1 2 3 4 5 6 USE [master]; GO -- SQL Server 2012 compatibility level ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 110; GO Moving Between Editions (Same Version) Changing SQL Server Editions: Standard, Enterprise, Evaluation and More \u0026ndash; shouldn\u0026rsquo;t be a problem especially if using an evaluation edition. The worst case scenario is to uninstall and reinstall a lower version.\nDowngrading Versions How to Migrate a SQL Server Database to a Lower Version \u0026ndash; does take some time but can be done using the Generate Scripts wizard inside SSMS.\n","permalink":"/p/moving-to-sql-2014-2016/","date":"Jan 01, 2021"}]