TLDR Version:
- For loading and cache-building, with SSDs your drives are likely not your bottleneck. A 20-100% faster M.2-NVME drive made no measurable difference in loading, and going from a SATA to M.2-NVME (100-600% faster device) as a data source improved cache build time by only 10%. Note – I’ve never compared a strictly SATA implementation against an NVME/SATA mix or a pure NVME solution, and have no feel for how large that change would be.
Background:
I just upgraded my system SSD from a fast 1TB part (Samsung 970EVO, M.2-NVME, PCIe3) to a faster 2TB Samsung 980Pro, M.2-NVME, PCIe4, then replaced a much slower 1TB SATA SSD with the 970. In this two-drive mix, the fastest device is my Windows “C:/” drive, hosting the OS, all software, most of my daily files, and most importantly, the HW Cache directory. The slower #2 device is my temp-working drive, and also contains the HW directory /HauptwerkSampleSetAndComponents/”, which includes installed instruments’ numerous “detail” files, such as individual Samples, ODFs, temperaments et al. I use several old-school magnetic hard-disks (SATA) for bulk storage, archival, and backup – for example, the HW vendor’s distribution packages.
I upgraded in two distinct steps, first the system drive, then my secondary drive. I tested the impact of each step on the load-from-cache and the rebuild-cache/load times of Alessandria, my largest sample set. I also ran a series of synthetic disk benchmarks (CrystalDiskMark 8 ) to get a feel for the best-possible theoretical performance of these various SSDs.
Some time back, Martin indicated that while HW makes excellent use of multiple cores/threading for its audio processing, its load process was simpler, and probably didn’t come close to taking full advantage of the faster SSDs now available to us. My memory is hazy as to whether loading was single-threaded, or had some (very) limited multi-threading. My experience with this upgrade is in-line with Martin’s note.
For I/O, I believe loading a cached instrument is mostly reading from my fastest (#1) device, while building a cache is mostly reading from the #2 drive, and writing to the #1 device. The HDDs didn’t take part in this test.
Config:
- I’m deploying Alessandria in its largest memory configuration: loading all perspectives and ranks as stereo 24-bit uncompressed, all samples and loops, no truncation. This config consumes 100+GB of RAM.
- CPU is a Ryzen 3900X, 12 Physical/24 Virtual cores. An X570 MOBO, DDR4 RAM at 3200GHz. No overclocking beyond enabling the memory’s stock XMP-profile to enable clocking at its rated 3200GHz. During testing, active CPU cores were clocking around 4.2GHz. No special Windows tuning for HW beyond not having any paging-file(s) (large-memory unload work-around for a Windows bug/feature.) Internet connected, fully loaded, AVirus, net-based backups et all. Idle CPU usage about 1%, idle memory about 8GB, estimate >200 processes running. MOBO supports the latest PCIe4 standard.
- Using a 7200RPM SATA magnetic HDD as a baseline performance of “1” for its highest sustained sequential read/write rates, my SATA SSD would peak as a “2.5x”, my PCIe3 Samsung 970 as a “12-15x”, and the PCIe4 980Pro as a “25-30x”. For random I/O, the fastest SSD is 60-1000x faster than the HDD. Unlike the HDD, the SSDs' random I/O performance very effectively scales with multi-threading – lots of parallel threads can be supported; with the HDDs, not so much.
Results:
- Upgrading the primary device from a fast PCIe3 SSD to a faster PCIe4 one (measured as 20-100% faster, depending on operations) makes no measurable improvement on either HW instrument cached load times, nor on cache-build times (while holding the #2 drive constant as the SATA SSD.)
- Subsequently upgrading the secondary device from the SATA SSD to the far faster 970EVO made a modest 10% reduction in cache build times. Of course, no impact on normal load times.
- Loading Alessandria from cache took 70 (+/-3) seconds for any of my three configs, with no discernible difference between them. A full cache-build & load took 610 seconds in the fastest config, about a repeatable 10% improvement over the slowest of the three variants. Almost all of this improvement correlated with replacing the SATA source drive with the NVME drive.
- When building Alessandria’s cache, HW CPU usage hovered at about 4%, lending credence to it being dominated by a single CPU thread for this function. In the fastest config, busy time for the #2 drive was <30%, and for the #1 output drive (writing the cache), <4%. The actual data transfer rates were far less than those percentages of the devices’ maximums.
- The faster system drive led to no subjective improvement in OS (Windows-10 Pro) nor in application load-start times.
- Synthetic benchmarks clearly identify areas of significant difference between the various devices and interfaces; these differences are highly non-linear, and real-world differences in performance, once into the moderately fast class of devices, will be rare and highly load dependent. For example, some of my own software sees a noticeable improvement with the faster SSD, but it is reading/writing very large files sequentially with large buffers.
Conclusions and Opinions:
- Presently, there seems little utility in spending much extra money to acquire PCIe4-capable SSDs over their last-gen PCIe3 equivalents. Not for HW, and not for the general system experience. That may change as they become more common and software is refactored to take advantage, or as pricing changes – but today, not so much! Note that a PCIe4 MOBO is perfectly fine hosting down-level versions of the PCIe standard. Prices do fluctuate wildly these days. The current price spread between PCIe3 and 4 SSDs seems to be about (USD) $40-50 per TB, but was listing as $90/TB spread a few weeks ago.
- Hypothesis: with an M.2-NVME SSD (any type), there may not be much performance to be gained by splitting between multiple devices as I did. The device utilization, both read and write, is so far below the measurable capabilities of the device that I’m guessing performance would be indistinguishable. Now, I still split devices to gain storage capacity, as larger drives can get pricey.
- If building a machine today, I would not purchase any SATA SSD, unless I had already filled all available M.2-NVME slots – in my area, the price of a SATA SSD and a vastly better performing PCIe3 M.2-NVME drive are about the same. For my own PC, built late 2019, I only used a SATA SSD for my #2 drive because I had one laying around! Note that M.2 MOBO connectors commonly support both M.2-SATA and M.2-NVME SSDs, and the SATA versions’ performance would be about the same as a standard SATA SSD. M.2 is just the "gum-stick" form factor and connector, not the interface protocol. The slower M.2-SATA devices have two notches on the connector end, the faster NVME variants have only a single notch.
- For loading and cache-building, with SSDs your drives are likely not your bottleneck. A 20-100% faster M.2-NVME drive made no measurable difference in loading, and going from a SATA to M.2-NVME (100-600% faster device) as a data source improved cache build time by only 10%. Note – I’ve never compared a strictly SATA implementation against an NVME/SATA mix or a pure NVME solution, and have no feel for how large that change would be.
Background:
I just upgraded my system SSD from a fast 1TB part (Samsung 970EVO, M.2-NVME, PCIe3) to a faster 2TB Samsung 980Pro, M.2-NVME, PCIe4, then replaced a much slower 1TB SATA SSD with the 970. In this two-drive mix, the fastest device is my Windows “C:/” drive, hosting the OS, all software, most of my daily files, and most importantly, the HW Cache directory. The slower #2 device is my temp-working drive, and also contains the HW directory /HauptwerkSampleSetAndComponents/”, which includes installed instruments’ numerous “detail” files, such as individual Samples, ODFs, temperaments et al. I use several old-school magnetic hard-disks (SATA) for bulk storage, archival, and backup – for example, the HW vendor’s distribution packages.
I upgraded in two distinct steps, first the system drive, then my secondary drive. I tested the impact of each step on the load-from-cache and the rebuild-cache/load times of Alessandria, my largest sample set. I also ran a series of synthetic disk benchmarks (CrystalDiskMark 8 ) to get a feel for the best-possible theoretical performance of these various SSDs.
Some time back, Martin indicated that while HW makes excellent use of multiple cores/threading for its audio processing, its load process was simpler, and probably didn’t come close to taking full advantage of the faster SSDs now available to us. My memory is hazy as to whether loading was single-threaded, or had some (very) limited multi-threading. My experience with this upgrade is in-line with Martin’s note.
For I/O, I believe loading a cached instrument is mostly reading from my fastest (#1) device, while building a cache is mostly reading from the #2 drive, and writing to the #1 device. The HDDs didn’t take part in this test.
Config:
- I’m deploying Alessandria in its largest memory configuration: loading all perspectives and ranks as stereo 24-bit uncompressed, all samples and loops, no truncation. This config consumes 100+GB of RAM.
- CPU is a Ryzen 3900X, 12 Physical/24 Virtual cores. An X570 MOBO, DDR4 RAM at 3200GHz. No overclocking beyond enabling the memory’s stock XMP-profile to enable clocking at its rated 3200GHz. During testing, active CPU cores were clocking around 4.2GHz. No special Windows tuning for HW beyond not having any paging-file(s) (large-memory unload work-around for a Windows bug/feature.) Internet connected, fully loaded, AVirus, net-based backups et all. Idle CPU usage about 1%, idle memory about 8GB, estimate >200 processes running. MOBO supports the latest PCIe4 standard.
- Using a 7200RPM SATA magnetic HDD as a baseline performance of “1” for its highest sustained sequential read/write rates, my SATA SSD would peak as a “2.5x”, my PCIe3 Samsung 970 as a “12-15x”, and the PCIe4 980Pro as a “25-30x”. For random I/O, the fastest SSD is 60-1000x faster than the HDD. Unlike the HDD, the SSDs' random I/O performance very effectively scales with multi-threading – lots of parallel threads can be supported; with the HDDs, not so much.
Results:
- Upgrading the primary device from a fast PCIe3 SSD to a faster PCIe4 one (measured as 20-100% faster, depending on operations) makes no measurable improvement on either HW instrument cached load times, nor on cache-build times (while holding the #2 drive constant as the SATA SSD.)
- Subsequently upgrading the secondary device from the SATA SSD to the far faster 970EVO made a modest 10% reduction in cache build times. Of course, no impact on normal load times.
- Loading Alessandria from cache took 70 (+/-3) seconds for any of my three configs, with no discernible difference between them. A full cache-build & load took 610 seconds in the fastest config, about a repeatable 10% improvement over the slowest of the three variants. Almost all of this improvement correlated with replacing the SATA source drive with the NVME drive.
- When building Alessandria’s cache, HW CPU usage hovered at about 4%, lending credence to it being dominated by a single CPU thread for this function. In the fastest config, busy time for the #2 drive was <30%, and for the #1 output drive (writing the cache), <4%. The actual data transfer rates were far less than those percentages of the devices’ maximums.
- The faster system drive led to no subjective improvement in OS (Windows-10 Pro) nor in application load-start times.
- Synthetic benchmarks clearly identify areas of significant difference between the various devices and interfaces; these differences are highly non-linear, and real-world differences in performance, once into the moderately fast class of devices, will be rare and highly load dependent. For example, some of my own software sees a noticeable improvement with the faster SSD, but it is reading/writing very large files sequentially with large buffers.
Conclusions and Opinions:
- Presently, there seems little utility in spending much extra money to acquire PCIe4-capable SSDs over their last-gen PCIe3 equivalents. Not for HW, and not for the general system experience. That may change as they become more common and software is refactored to take advantage, or as pricing changes – but today, not so much! Note that a PCIe4 MOBO is perfectly fine hosting down-level versions of the PCIe standard. Prices do fluctuate wildly these days. The current price spread between PCIe3 and 4 SSDs seems to be about (USD) $40-50 per TB, but was listing as $90/TB spread a few weeks ago.
- Hypothesis: with an M.2-NVME SSD (any type), there may not be much performance to be gained by splitting between multiple devices as I did. The device utilization, both read and write, is so far below the measurable capabilities of the device that I’m guessing performance would be indistinguishable. Now, I still split devices to gain storage capacity, as larger drives can get pricey.
- If building a machine today, I would not purchase any SATA SSD, unless I had already filled all available M.2-NVME slots – in my area, the price of a SATA SSD and a vastly better performing PCIe3 M.2-NVME drive are about the same. For my own PC, built late 2019, I only used a SATA SSD for my #2 drive because I had one laying around! Note that M.2 MOBO connectors commonly support both M.2-SATA and M.2-NVME SSDs, and the SATA versions’ performance would be about the same as a standard SATA SSD. M.2 is just the "gum-stick" form factor and connector, not the interface protocol. The slower M.2-SATA devices have two notches on the connector end, the faster NVME variants have only a single notch.
Cheers, Bob