It is currently Fri Mar 29, 2024 1:44 am


Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build times

Buying or building computers for Hauptwerk, recommendations, troubleshooting computer hardware issues.
  • Author
  • Message
Offline

bobhehmann

Member

  • Posts: 74
  • Joined: Sun Dec 01, 2019 6:08 pm
  • Location: Moorpark, California

Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build times

PostMon Aug 16, 2021 5:17 pm

TLDR Version:
- For loading and cache-building, with SSDs your drives are likely not your bottleneck. A 20-100% faster M.2-NVME drive made no measurable difference in loading, and going from a SATA to M.2-NVME (100-600% faster device) as a data source improved cache build time by only 10%. Note – I’ve never compared a strictly SATA implementation against an NVME/SATA mix or a pure NVME solution, and have no feel for how large that change would be.

Background:
I just upgraded my system SSD from a fast 1TB part (Samsung 970EVO, M.2-NVME, PCIe3) to a faster 2TB Samsung 980Pro, M.2-NVME, PCIe4, then replaced a much slower 1TB SATA SSD with the 970. In this two-drive mix, the fastest device is my Windows “C:/” drive, hosting the OS, all software, most of my daily files, and most importantly, the HW Cache directory. The slower #2 device is my temp-working drive, and also contains the HW directory /HauptwerkSampleSetAndComponents/”, which includes installed instruments’ numerous “detail” files, such as individual Samples, ODFs, temperaments et al. I use several old-school magnetic hard-disks (SATA) for bulk storage, archival, and backup – for example, the HW vendor’s distribution packages.

I upgraded in two distinct steps, first the system drive, then my secondary drive. I tested the impact of each step on the load-from-cache and the rebuild-cache/load times of Alessandria, my largest sample set. I also ran a series of synthetic disk benchmarks (CrystalDiskMark 8 ) to get a feel for the best-possible theoretical performance of these various SSDs.

Some time back, Martin indicated that while HW makes excellent use of multiple cores/threading for its audio processing, its load process was simpler, and probably didn’t come close to taking full advantage of the faster SSDs now available to us. My memory is hazy as to whether loading was single-threaded, or had some (very) limited multi-threading. My experience with this upgrade is in-line with Martin’s note.

For I/O, I believe loading a cached instrument is mostly reading from my fastest (#1) device, while building a cache is mostly reading from the #2 drive, and writing to the #1 device. The HDDs didn’t take part in this test.

Config:
- I’m deploying Alessandria in its largest memory configuration: loading all perspectives and ranks as stereo 24-bit uncompressed, all samples and loops, no truncation. This config consumes 100+GB of RAM.
- CPU is a Ryzen 3900X, 12 Physical/24 Virtual cores. An X570 MOBO, DDR4 RAM at 3200GHz. No overclocking beyond enabling the memory’s stock XMP-profile to enable clocking at its rated 3200GHz. During testing, active CPU cores were clocking around 4.2GHz. No special Windows tuning for HW beyond not having any paging-file(s) (large-memory unload work-around for a Windows bug/feature.) Internet connected, fully loaded, AVirus, net-based backups et all. Idle CPU usage about 1%, idle memory about 8GB, estimate >200 processes running. MOBO supports the latest PCIe4 standard.
- Using a 7200RPM SATA magnetic HDD as a baseline performance of “1” for its highest sustained sequential read/write rates, my SATA SSD would peak as a “2.5x”, my PCIe3 Samsung 970 as a “12-15x”, and the PCIe4 980Pro as a “25-30x”. For random I/O, the fastest SSD is 60-1000x faster than the HDD. Unlike the HDD, the SSDs' random I/O performance very effectively scales with multi-threading – lots of parallel threads can be supported; with the HDDs, not so much.

Results:
- Upgrading the primary device from a fast PCIe3 SSD to a faster PCIe4 one (measured as 20-100% faster, depending on operations) makes no measurable improvement on either HW instrument cached load times, nor on cache-build times (while holding the #2 drive constant as the SATA SSD.)
- Subsequently upgrading the secondary device from the SATA SSD to the far faster 970EVO made a modest 10% reduction in cache build times. Of course, no impact on normal load times.
- Loading Alessandria from cache took 70 (+/-3) seconds for any of my three configs, with no discernible difference between them. A full cache-build & load took 610 seconds in the fastest config, about a repeatable 10% improvement over the slowest of the three variants. Almost all of this improvement correlated with replacing the SATA source drive with the NVME drive.
- When building Alessandria’s cache, HW CPU usage hovered at about 4%, lending credence to it being dominated by a single CPU thread for this function. In the fastest config, busy time for the #2 drive was <30%, and for the #1 output drive (writing the cache), <4%. The actual data transfer rates were far less than those percentages of the devices’ maximums.
- The faster system drive led to no subjective improvement in OS (Windows-10 Pro) nor in application load-start times.
- Synthetic benchmarks clearly identify areas of significant difference between the various devices and interfaces; these differences are highly non-linear, and real-world differences in performance, once into the moderately fast class of devices, will be rare and highly load dependent. For example, some of my own software sees a noticeable improvement with the faster SSD, but it is reading/writing very large files sequentially with large buffers.

Conclusions and Opinions:
- Presently, there seems little utility in spending much extra money to acquire PCIe4-capable SSDs over their last-gen PCIe3 equivalents. Not for HW, and not for the general system experience. That may change as they become more common and software is refactored to take advantage, or as pricing changes – but today, not so much! Note that a PCIe4 MOBO is perfectly fine hosting down-level versions of the PCIe standard. Prices do fluctuate wildly these days. The current price spread between PCIe3 and 4 SSDs seems to be about (USD) $40-50 per TB, but was listing as $90/TB spread a few weeks ago.
- Hypothesis: with an M.2-NVME SSD (any type), there may not be much performance to be gained by splitting between multiple devices as I did. The device utilization, both read and write, is so far below the measurable capabilities of the device that I’m guessing performance would be indistinguishable. Now, I still split devices to gain storage capacity, as larger drives can get pricey.
- If building a machine today, I would not purchase any SATA SSD, unless I had already filled all available M.2-NVME slots – in my area, the price of a SATA SSD and a vastly better performing PCIe3 M.2-NVME drive are about the same. For my own PC, built late 2019, I only used a SATA SSD for my #2 drive because I had one laying around! Note that M.2 MOBO connectors commonly support both M.2-SATA and M.2-NVME SSDs, and the SATA versions’ performance would be about the same as a standard SATA SSD. M.2 is just the "gum-stick" form factor and connector, not the interface protocol. The slower M.2-SATA devices have two notches on the connector end, the faster NVME variants have only a single notch.
Cheers, Bob
Offline
User avatar

engrssc

Member

  • Posts: 7283
  • Joined: Mon Aug 22, 2005 10:12 pm
  • Location: Roscoe, IL, USA

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostMon Aug 16, 2021 5:46 pm

While my testing was not up to that level, but the conclusions reached are the same. My reasons for doing the upgrades in my case(s) was attempting to compete, esp in loading times with both Rodgers and Allen.. Basically it was a no contest situation where both Rodgers and Allen won easily. The really only edge Hauptwerk had was price. I must admit both "competitors" sounded great, all things otherwise equal. The other area that is a factor is the time it takes to change from one sample set to another. There, too both Rodgers and Allen won easily. As the client remarked, "Rodgers and Allen must be using super computers. "

Rgds,
Ed
Offline
User avatar

mdyde

Moderator

  • Posts: 15444
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostTue Aug 17, 2021 3:48 am

Hello Bob,

Thanks for the detailed benchmarks.


bobhehmann wrote:Some time back, Martin indicated that while HW makes excellent use of multiple cores/threading for its audio processing, its load process was simpler, and probably didn’t come close to taking full advantage of the faster SSDs now available to us. My memory is hazy as to whether loading was single-threaded, or had some (very) limited multi-threading. My experience with this upgrade is in-line with Martin’s note.


Here's one topic on that:

viewtopic.php?f=4&t=18931&p=146309#p146309

mdyde wrote:Here's a topic (with links to others) that cover loading speeds:

viewtopic.php?f=16&t=18771&p=145233#p145233

mdyde wrote:Here are some topics on loading speeds from cache:

http://forum.hauptwerk.com/viewtopic.php?f=4&t=14624
http://forum.hauptwerk.com/viewtopic.ph ... 61#p119622
http://forum.hauptwerk.com/viewtopic.ph ... 777#p74773
http://forum.hauptwerk.com/viewtopic.ph ... 56#p108156
http://forum.hauptwerk.com/viewtopic.php?f=16&t=14683

In brief:

- Hauptwerk also needs to do other processing on the cache data when loading (such as decrypting it).
- After loading from cache, Hauptwerk's log ('Help | View activity log', and look at INF:2157) will show loading performances stats.
- With extremely fast (PCIe etc.) SSDs the per-core CPU performance is likely to be the bottleneck.
- When loading from cache Hauptwerk is currently (v5) able to take advantage about 5-6 cores (but more won't hurt).
- Hauptwerk is tested with and optimised for Intel CPUs, but I know of no reason that an AMD CPU with AVX2 shouldn't perform well, and Hauptwerk v5 should detect it as having AVX2 (check the 'INF:4165 ... Processor build type' in the activity log after launching Hauptwerk) and use that capability.
- Hauptwerk v5 should load caches a bit faster than v4 did, but making it take full advantage of the performance capabilities of very fast PCIe SSDs would be a fairly big project -- one for the future.


As covered in those posts, we do fully appreciate that some people these days have the more recent extremely high-performance PCIe/NVMe/etc. SSDs, and further optimising Hauptwerk to take better advantage of such SSDs is logged as a high-priority enhancement request (but it isn't a small amount of work, and we have finite resources). I hope that helps to clarify, and thanks for your patience.


Here's another one (from 2017):

viewtopic.php?f=16&t=16473&p=123818#p123818
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline

bobhehmann

Member

  • Posts: 74
  • Joined: Sun Dec 01, 2019 6:08 pm
  • Location: Moorpark, California

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostTue Aug 17, 2021 5:08 am

Thanks Martin. I did have at least one brain-cramp in my post - my comment about appearing to have minimal multi-threading (even if totally false), was based on a cache build. For a simple load-from-cache, it is clearly highly multi-threaded. In my config, Alessandria is about 55GB on disk, 105GB in RAM (24-bit uncompressed). From Task Manager, which is averaging over time, peak read rates were about 1GB/sec, average around 800MB/sec, which jives with the load time. Several cores regularly spiked to 100% CPU, but averaged closer to 50% (they had a roughly sinusoidal pattern ranging from 0 to 100%, average 50%, 10 second wavelength, pretty regular patterns) - other cores were stable at 40-50%, most were nominal near zero. Net CPU usage was slightly under 20%. In the HW logs (I may be reading them incorrectly!), looks like a reader thread got about 1.8GB/sec in when busy @50% duty cycle, which matches calculated transfer rates. The other 5 load processes (Stage Busy?) had #2 at 99%, 2 processes around 50%, 1 at 30%, one at 17%.

When benching the drive with Crystal, Sequential read, large blocks, limited to 1 queue depth and 1 process (no parallel reads), I get from 5.5 to 6GB/second depending on block size. With threading, I can get near the rated 7GB/second for sequential reads.

Those drive read numbers would match my experience that the faster drive didn't matter, as the older 970 PCIe3 would still be faster than what this CPU is capable of requesting, with this design (I can read the 970 at 5GB/second). But a SATA SSD would be slower than my HW: mine benched at about 540MB peak sustained read, and a Seagate Barracuda came in at 180MB/second - so going from HDD to SATA SSD to NVME SSD should show a palpable improvement at each step.

So your mission, should you choose to accept it, is to squeeze 7GB/second out of 6.0.3 (alright, 7.0 and I'll pay for it :) )
Cheers, Bob
Offline
User avatar

mdyde

Moderator

  • Posts: 15444
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostTue Aug 17, 2021 6:18 am

Thanks, Bob!
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

vpo-organist

Member

  • Posts: 306
  • Joined: Wed Apr 29, 2020 6:49 am

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostTue Aug 17, 2021 9:24 am

bobhehmann wrote:So your mission, should you choose to accept it, is to squeeze 7GB/second out of 6.0.3 (alright, 7.0 and I'll pay for it :) )

I guess you can load the samples directly (while playing) from the storage medium with a very fast M.2 SSD. That would be a big step forward.

I'll buy a new PC and I don't really know what the future requirements will be.
Currently would have to buy a 128 GB PC with fast M.2 (is there M.2 raid?), 3200 DDR and high clocked cores.

How many cores are used by Hauptwerk? Should it be a Ryzen XY or an Intel?
Is AVX-512 also supported by Hauptwerk? I currently only know the Intel Core i9-10980XE with AVX-512, Turbo 4.8 GHz, 18 cores/36 threads and costs around 1000 EUR. CPU benchmark 33989.

Or should it be a Ryzen 9 5900X, 4.8 GHz, 12Core/24Threads, AVX-2, Bench 39518 for 544 EUR?

That is not easy for me to answer.
Offline

bobhehmann

Member

  • Posts: 74
  • Joined: Sun Dec 01, 2019 6:08 pm
  • Location: Moorpark, California

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostTue Aug 17, 2021 5:43 pm

A couple of random (and too lengthy :twisted: ) thoughts:

1) As a level set, my 3900x (12/24) maxed out Hauptwerk's Polyphony Test at 30,000+ simultaneous pipes @ ~50% CPU utilization. I had to use two large dowels and stand up and use my body weight depress all my black and white keys at them same time - a nice reminder of the limitations on number of coupled manuals on the unassisted actions of old!

However, some of the most modern sample sets are starting to push processors: PiotrG's Alessandria is a great example - I can easily push towards my CPU limits with this instrument, where HW is probably truncating some sounds in the background due to polyphony/reverb (or at least is thinking about it) - I can see it, but not hear it, so no worries. A few months back, Piotr told me he also uses an R3900x for his work.

When needed, HW does a fantastic job multi-threading and using all those virtual cores. When I push with Alessandria, I can see 4 worker threads consistent at about 40% each, and 20 other (sound-rendition?) threads even at 80% each. Given the parallel nature of the problem, I'd guess that HW would scale very well with any of these higher-end processors - there seems plenty of headroom in those worker threads that I am guessing are the controlling bottleneck, so I'd confidently guess that HW would scale up nicely to a 16/32 processor like a 3950 or 5950. Thread-rippers and XEONs probably do great also, but that class is overkill for me.

Other sample sets I have use far less CPU (all are "everything" loads, all perspectives/ranks, all stops pulled, rapid iteration of large chords for extended time, realistic reverb lengths): HW's own Silbermann, hard to get any thread much over 10%; Schyven-Laeken, 8%; Friesach 20%; Tihany 20%; Laurenskerk 22%...

2) When I first built my PC in late 2019, I purchased only 64GB of RAM, with the intent to expand to 128GB. Doing it over, I'd purchase all 128GB as a single kit (likely 4x32GB sticks for a consumer MOBO). The reason: vendors sometimes change the underlying parts (memory chips, controller chips) without changing the part-number of the stick, and later memory may not be compatible with the same overclocking parms you might have setup for the first batch, if you're into that. That happened to me: I'd tweaked my first 64KB and gotten about 5-6% performance improvement for some other work (not HW) I do on my PC - when I expanded, that didn't work. I found the manufacturer had changed the vendor of the memory chips inside. Just using the new 64GB, to better my speed, I had to select incompatible settings that wouldn't work on the first 64GB alone. I later found articles from the cognoscenti noting that manufacturers did this a lot, without telling people. However, the entire mixed 128GB kit works fine at the factory (XMP) overclock to the rated 3200. I would not recommend mixing two different part-numbers, not at these speeds - best to keep all memory sticks the same when at the high-end.

3) DDR4-3200 was bench-marked to be an optimal choice for 3900/3950 CPUs; faster clocks can be used, but do not significantly improve performance on these specific CPUs; I believe the sweet spot for 5900/5950 chips is thought to be DDR4-3600/3800, but cannot verify personally. My money, I'd target DDR4-3800 for a 59XX.

4) The Ryzen line does not presently support AVX-512, but does support the previous AVX instruction set: Martin would know, HW takes advantage of some of the AVX instructions, but I think only those supported by Ryzen. Rumor is AMD will add AVX-512 to the Zen-4 design (their next big generational leap). Benchmarks of Intel chips were finding that heavy use of the AVX-512 instructions generated enormous amounts of excess heat, to the point of causing overall thermal throttling. Of course, that could change over time as designs improve.

5) RAID with this class of SSDs is tricky. PCIe bandwidth is limited, shared, an often asymmetric in performance (M.2 slots farther from the CPU, with longer signal paths, may run slower - I can actually measure that difference with my 970 SSD on my MOBO, let alone the 980Pro.) Very expensive high-end MOBOs take steps to mitigate this. Use cases that would result in a palpable improvement by using RAID are likely rare, and HW of today is probably not one of them. RAID-0, stripping, also reduces reliability, as a single device failure pretty much trashes everything. Note that RAID on SATA-class devices absolutely can produce perceivable performance improvements. My money, I'd spend it on a larger good performing NVME SSD. Also, Windows10 Pro if you want to do Window's RAID - I don't think the Home edition supports it.

6) AMD has pretty well closed with Intel on core-level performance, generally with more cores for the money. And single-core performance can be surprisingly good. Comparing clock-rates outside a narrow family with the same architecture doesn't help too much. Modern AMD has been getting more instructions per clock cycle than Intel, but that is subject to change with every new product. In the same class, AMD will often have a higher base clock than the equivalent Intel, but the Intel can boost by a greater %. However, max boosting tends to be thermally limited to a few cores for awhile - you are not likely going to see a consumer Intel reach or sustain its boost-clock on most/all cores at the same time, at least not without exotic cooling. If you were willing to entertain that Intel at 1000EUR, I'd consider the AMD 5950x - in the USA, that AMD costs about 80% of the Intel, is a 16/32 processor, which often benches at about 30-40% better than the Intel for a single core, perhaps 10% better overall when maxing all cores.

Intels can often use (or require!) faster memory to get their max performance; AMDs of this class generally have a narrower sweet-spot for their ideal memory clock, and a lower one than their Intel brethren. Intels often have lots of headroom to gain extra performance via customized overclocking, if you want to do that; AMDs generally leave less headroom for overclocking improvements, I never bothered with mine. Note that Intel has dropped their "1 free replacement CPU if you damage it by overclocking" warranty, and AMD never gave grace if you custom overclocked and hurt the CPU. AMD no longer supplies free coolers in this class, so you'll need to figure in a cooling solution with either manufacturer.

Many Intels have a rep for running hotter than similar class AMDs when pushed, but I can't say from experience. However, I ran my 3900x full-out, 24x7 for over a year, running Folding-at-Home during the Covid crisis. For six months, I ran only using the supplied little cooling tower that came with the CPU, without every thermally throttling. I finally replaced that cooler with an inexpensive large 2-fan cooler (Scythe FUMA-II), which runs nearly silently, and dropped my temps about 6-7C lower still. The little fan cooled adequately, but was very noisy and constantly changing speed, so it was hard to ignore! A VTH (Virtual Turbine Helicopter) it was... However, I never needed anything exotic to cool, even when running the CPU and a fairly high-end video card full-out full-time. (It did keep the room warm, though!) As an aside, I could play most any mid-size organ while FAH was running full-out, without any audio glitching. The modern CPU's are fast enough to keep up and context switch, without lots of tweaking.

Summary - I doubt you'd be disappointed with the HW performance of any of these devices - it may come down to local price and availability.
- Research the web for ideal memory for your CPU choice - sweet-spot speeds differ widely.
- If only using 2 of 4 MOBO memory slots, pay attention to the instructions for optimal memory placement. Any placement will technically work, but the correct placement will visibly perform better.
- Plan to use <=80% of your SSD space, they'll have both performance and lifespan issues when filled too full - however, with that in mind, high class SSDs have excellent life spans given the real-amount of data written by most users. I've tended to go Samsung due to excellent warranties in the USA (5 years for these SSDs) and near topline performance and wide compatibility - but there are many quality vendors out there. I don't find Samsung grossly over-priced, but they are seldom deeply discounted, either.
- After first building your PC, go into your BIOS and enable the XMP profile for your memory. DDR4 clocks at 2666 - that's the DDR4 spec. Anything faster is technically overclocked, and is generally disabled by default. XMP is a standard way for the memory's manufacturer to place config-data on the memory stick that the BIOS can use to correctly overclock to get to the memory's rated speed, but you have to tell the BIOS to turn it on.
- Leave some power-supply headroom, perhaps 50% greater than your expected full load: they're more efficient when not fully loaded, and they draw based on power used, not the power-supply's wattage rating.
- For major MOBO manufacturers, pay attention to their memory compatibility list: this lists precise memory part-numbers they have validated to work with the MOBO. If you purchase off-list, you may get into a two-vendor argument over who is at fault. Real-world, I got trapped between a major MOBNO vendor and a major SSD vendor with mainstream parts from each, where all other disks worked fine with the MOBO, and all other PCs worked fine with the disk, but the mix failed.
- Consider your backup strategy: I use a large (6TB) SATA HDD solely as a local repository for daily automated backups. Experience says if your backup is not fully automated, it wont be there when you need it. FYI - my HW plays through a running backup just fine on this class of PC. Also at least consider offsite (disaster) backup if you have substantial irreplaceable data - fire, theft, power-supply going down in a blaze of glory, taking out all surrounding electronics (had that actually happen once!) can all destroy both the original and the local backup copies. As my PC is not just for HW, I use a continuous network-based backup service, that keeps a near-realtime cloud copy of everything other than the OS and temp data, starting the transfer as soon as the data is written. Again, fully automated. I'm colored by my commercial technical career, where important stuff has at least 3 copies: production; onsite backup for normal recovery; offsite backup for disasters.

Best wishes, and have fun! Also, listen to Ed (engrssc) and Martin - they know what they're doing. Seriously so.
Cheers, Bob
Offline
User avatar

mdyde

Moderator

  • Posts: 15444
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 5:57 am

Hello vpo-organist,

A few minor points to add to Bob's very comprehensive reply (thanks very much, Bob):

vpo-organist wrote:How many cores are used by Hauptwerk?


- Hauptwerk's audio and convolution reverb engines should run distributed across any number of logical cores up to 64.

- If I recall correctly, the most I've ever heard of anyone using with Hauptwerk is 32 physical cores, although more commonly up to 16 or 24 (and even then, more than 12 is rare).

- More CPU cores do potentially benefit polyphony and the convolution reverb engine, but per-core CPU performance is still extremely important, since threads running on different cores inevitably sometimes need to communicate with each other (thread synchronisation, exchanging data, etc.), and the more cores there are, the more the overheads in keeping them synchronised. Hence with huge numbers of cores there may eventually be a point beyond which more cores even reduce overall performance. Although I can't give advice based on benchmarks, my inclination would be to be wary of going much beyond about 8 physical cores if doing so also involved a significant trade-off in per-core performance (base clock speed, etc.).

- Base clock speed, CPU cache, and memory bandwidth are also extremely important, as is making sure that any candidate CPU has support for the AVX2 instruction set.

- If it doesn't involve much of a trade-off in per-core performance (base clock speed, etc.), then I would expect that up to something like 16 physical CPU cores (32 logical) would scale well with minimal additional synchronisation overheads. (Numbers of cores beyond that may well perform well too, but we don't have feedback on that.)

- Hauptwerk's sample set cache loading mechanism (which determines how fast organs are loaded into memory) can currently take advantage of up to 6 CPU cores (but more won't hurt).

- The MIDI/relay event processing (organ switches, pipe on/off events, etc.) are necessarily serialised to ensure that the state of the organ relay remains consistent.

- Each of the background models (wind supply models, pipework modulation, tremulants) runs in its own thread (one thread per model). That's done since the wind model's time-slices need to be extremely short/frequent (sub-millisecond) and serialised. (It might possibly be be able to gain some performance in the future from further multi-threading the wind model within the time-slices, but possibly not, since the overheads of so many, extremely frequent, thread context switches might exceed any potential performance gain.)

- The first few logical cores are kept free of audio/convolution engine loads, and are instead used for the MIDI/relay, background models, and other threads. If there are 12 or more logical CPU cores then the first 4 logical cores will be reserved for those purposes.

- Hence per-core performance may well become a bottleneck for the achievable polyphony, even if you had a huge number of cores for the audio engine.

So in summary, to emphasize again: per-core performance (base clock speed, etc.) is still very, very important, even with a lot of cores.

vpo-organist wrote:Is AVX-512 also supported by Hauptwerk?


AVX-512 might possibly benefit convolution reverb performance (which relies on a third-part library for the DFFT function), but Hauptwerk (v6.0.2) won't currently benefit from it above AVX2 in other respects. (AVX2 is definitely worthwhile.)

(We did originally make a dedicated AVX-512 build for testing, but there seemed to be strange compatibility problems with at least one AVX-512 CPU, and it gave negligible performance/polyphony gain anyway, so we abandoned it for now, to be on the safe side. It might be resurrected in the longer-term if it proves beneficial.)

vpo-organist wrote:I currently only know the Intel Core i9-10980XE with AVX-512, Turbo 4.8 GHz, 18 cores/36 threads and costs around 1000 EUR. CPU benchmark 33989.

Or should it be a Ryzen 9 5900X, 4.8 GHz, 12Core/24Threads, AVX-2, Bench 39518 for 544 EUR?


Bob's reply covers that in great depth, and the only thing I would add is that Hauptwerk's convolution reverb engine might possibly perform better on Intel CPUs than AMD, all else being equal (same per-core performance, etc.), but in other respects Hauptwerk should be optimised equally for instruction sets (AVX2, etc.) on Intel vs. AMD CPUs.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

johnstump_organist

Member

  • Posts: 547
  • Joined: Wed Mar 25, 2009 1:15 pm
  • Location: San Antonio, Texas

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 7:08 am

There, too both Rodgers and Allen won easily. As the client remarked, "Rodgers and Allen must be using super computers. "

Rgds,
Ed[/quote]
Allen &Rodgers load totally dry samples and do not load a sample for every note, therefore making the size of the file much smaller. There may also be fewer loops. With dry samples no need for multiple release samples. And they have proprietary computers doing nothing but the organ software, all contributing factors to fast loading times.
John
Offline
User avatar

engrssc

Member

  • Posts: 7283
  • Joined: Mon Aug 22, 2005 10:12 pm
  • Location: Roscoe, IL, USA

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 8:45 am

johnstump_organist wrote:Allen & Rodgers load totally dry samples and do not load a sample for every note, therefore making the size of the file much smaller. There may also be fewer loops. With dry samples no need for multiple release samples. And they have proprietary computers doing nothing but the organ software, all contributing factors to fast loading times.
John


I've speculated, without being able to test the theory, if it would be possible to in effect run (just) the samples on a separate computer and then all of the auxiliary controls, features, etc on another (computer)? This to increase the load speed efficiency. The question being could the "structure" of Hauptwerk (separating out the common to every sample set items from the variables) be changed to do as Allen and Rodgers do in that regard? From Martin's recent answer, I conclude that a great many of these so called possibilities have been considered and even tested. And, yes, there are patent properties to be regarded also.

Rgds,
Ed
Offline
User avatar

mdyde

Moderator

  • Posts: 15444
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 10:07 am

engrssc wrote:I've speculated, without being able to test the theory, if it would be possible to in effect run (just) the samples on a separate computer and then all of the auxiliary controls, features, etc on another (computer)? This to increase the load speed efficiency. The question being could the "structure" of Hauptwerk (separating out the common to every sample set items from the variables) be changed to do as Allen and Rodgers do in that regard?


Hello Ed,

Redeveloping Hauptwerk in that way (with the sample playback engine running on separate custom DSP hardware) would be years of work -- not something we can entertain for now, I'm afraid.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

engrssc

Member

  • Posts: 7283
  • Joined: Mon Aug 22, 2005 10:12 pm
  • Location: Roscoe, IL, USA

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 10:21 am

Understand and that answer comes as no surprise

While not under the present concept of Hauptwerk, consider if you will a (customized) modular "Smart Home" version of HW with multiple micro computers each doing their task, wind model, reverb, stop control, etc all under the control of a less overhead, smaller faster computer. This in place of a single larger, more expensive single computer. Allowing these individual microcontrollers (Arduinos maybe?) to do their own complete processing? (1 - Arduinos is approx $15 USD) should lessen the oversell cost, yes?

Considering just a few years ago there were no Smart Homes, no self driving cars etc. I've always been convinced we can be limited by own lack of imagination and at other times lack of $$$. :roll:

While benchmarks can be useful in making purchasing decisions, the results are more important in applications such as Hauptwerk. Gaining a few milli-second improvement might be important to gamers, but it isn't what drives a HW sample set. Then, too there is the law of diminishing returns.

Rgds,
Ed
Offline
User avatar

mdyde

Moderator

  • Posts: 15444
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 10:25 am

Thanks, Ed.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

engrssc

Member

  • Posts: 7283
  • Joined: Mon Aug 22, 2005 10:12 pm
  • Location: Roscoe, IL, USA

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 10:43 am

BTW, the above theory was part of the reason I bought 3 - Hauptwerk licenses. To evaluate if different portions of HW could function each on separate computers.

Les Deutsch's latest project gives an interesting insight into that involvement. 8)

http://www.nightbloomingjazzmen.com/Custom_4M_Rodgers.html

Takeaway summary customization isn't for everyone. :roll: And depending on the specific circumstance, may require replacing your front door. :wink:

Consider tho:
ldeutsch wrote:The organ takes about ten seconds between turning on the power switch and playing. The Hauptwerk computer is always on - needing only to awake from sleep mode. From past experience, this only takes about three seconds. The same is true for the powered monitors I used with Hauptwerk. The Rodgers console itself takes about five seconds to boot. The slowest component to turn on turns out to be the Behringer X32 mixer - accounting for the ten seconds approximate warm up.

Rgds,
Ed
Offline
User avatar

vpo-organist

Member

  • Posts: 306
  • Joined: Wed Apr 29, 2020 6:49 am

Re: Impact of PCIe4 SSDs vs PCIe3 on HW6 load/cache build ti

PostWed Aug 18, 2021 4:10 pm

Many thanks for your comprehensive responses!
I've been building my own PC's for years. However, if six to seven years have passed, then I need to update my knowledge first. You have helped me with that.

About the "super computer" from Allen/Rogers:
I don't know those organs personally. These manufacturers probably work with techniques that cannot be implemented with a PC. Possibly these organs use flash memory, so that the sound libraries are immediately available after switching on the organ.
Or maybe the samples are not loaded at all, but played directly from the fast SSD memory. There are many possibilities.

For Hauptwerk, the next stage will be an extensive extension to use Convolution Reverb (I own HW VI).
There I would like to ask Martin Dyde to push this topic. Especially Gernot Wurst from Prospectum is an excellent contact person for this, because he creates samplesets based on IR and has professional knowledge about it. At the moment it is probably not possible to deliver an IR configuration, is that right?
The use of IR will solve some problems for the user: Much less memory consumption, much shorter loading times, much less polyphony requirements, etc.

We will also see with the upcoming Ansbach/Wiegleb sample set how fantastic a sample set with Convolution Reverb will sound.

The better and faster Hauptwerk will support IR, the better sound experiences we may enjoy.

As for long load times:
This can be related to different sized samples, many channels and complex configurations. Also, the 3-release technique contributes a lot to the high loading times compared to IR.

An example is the PAB from Inspired Acoustics, which includes complex configurations. With my 6 cores and i7 3990K CPU, I have a load time of three and a half minutes for a relatively dry stereo set! Billerbeck load time is uncomfortably high due to the many channels.

AVX-512 reduces the clock and Linus Torwalds rants about AVX-512, he wants to see the technology die. I will buy an AVX-2 CPU. AVX-512 sounds very good because it uses 512-bit blocks to further speed up the FFT calculation.

I'll process your experiences first and hope to put together a future-proof PC.

Best regards
Next

Return to Computer hardware / specs

Who is online

Users browsing this forum: No registered users and 4 guests