It is currently Sat Sep 23, 2023 1:27 pm


Hauptwerk across multiple NUMA nodes windows Server 2022

Hauptwerk software technical support only. Please make sure you have read the manual, tutorials and FAQ pages before requesting support.
  • Author
  • Message
Offline

Don_prince

Member

  • Posts: 8
  • Joined: Sat Oct 28, 2017 5:57 pm

Hauptwerk across multiple NUMA nodes windows Server 2022

PostMon May 15, 2023 4:11 pm

Hi,

I recently bought a Lenovo x3950 x6 that I had a usecase for for a client.
It has 8 Xeon CPU's (e7-8880 v4, 22 core CPU's)

I run Windows Server 2022 on it because Windows 11 (Enterprise or Workstation) does not support more than 4 cpu's.

I also tried Hauptwerk on it and the results are awsome. However I noticed something odd.

Contrary to my fujitsu celcius with dual xeons, only one CPU gets loaded, so Hauptwerk does not seem to cross NUMA nodes. I know that on my fujitsu celcius both CPU's get fully loaded.

Is this a Windows Server issue? Is this related to the server having so many cores Hauptwerk will not cross NUMA nodes once it has that many cores?

I will do testing with this in Windows 11 Enterprise too, will report the findings back...
Offline
User avatar

mdyde

Moderator

  • Posts: 15133
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostTue May 16, 2023 3:52 am

Hello Don,

Hauptwerk currently supports a maximum of 64 logical CPU cores. I see that each of your Xeon e7-8880 v4 CPUs has 22 physical cores and 44 logical cores ( https://ark.intel.com/content/www/us/en ... 0-ghz.html ). As far as I know, nobody has ever tried to run it on a system with more than 64 logical cores before (or maybe even with more than 32), but I'll log potentially adding future support for it as an enhancement request.

Whilst Hauptwerk doesn't officially support Windows Server 2022 (and I'm not aware of anyone who has tried it previously), II think it's likely to work, but only the first 64 logical CPU cores will be used by Hauptwerk (on any operating system).

Since an extra virtual (Hyper-Threaded) core gives much less benefit than an extra physical core, if you can disable Hyper-Threading in the BIOS then you would halve the total logical CPU count (to 8x22=176) but Hauptwerk would be using more of the physical cores, which I would expect to give a significant performance benefit.

That said, since synchronising each additional logical core adds some performance overheads, there's likely to be a point beyond which additional cores won't actually benefit performance overall.

See also: viewtopic.php?f=16&t=20719&p=155080&hilit=cores#p155080

mdyde wrote:I'm afraid I don't have any specific benchmarks to offer for the hardware you're looking for, or much more to add beyond what was already covered in the previous discussions on it in this thread:

https://forum.hauptwerk.com/viewtopic.php?f=16&t=19945

mdyde wrote:A few minor points to add to Bob's very comprehensive reply (thanks very much, Bob):

vpo-organist wrote:How many cores are used by Hauptwerk?


- Hauptwerk's audio and convolution reverb engines should run distributed across any number of logical cores up to 64.

- If I recall correctly, the most I've ever heard of anyone using with Hauptwerk is 32 physical cores, although more commonly up to 16 or 24 (and even then, more than 12 is rare).

- More CPU cores do potentially benefit polyphony and the convolution reverb engine, but per-core CPU performance is still extremely important, since threads running on different cores inevitably sometimes need to communicate with each other (thread synchronisation, exchanging data, etc.), and the more cores there are, the more the overheads in keeping them synchronised. Hence with huge numbers of cores there may eventually be a point beyond which more cores even reduce overall performance. Although I can't give advice based on benchmarks, my inclination would be to be wary of going much beyond about 8 physical cores if doing so also involved a significant trade-off in per-core performance (base clock speed, etc.).

- Base clock speed, CPU cache, and memory bandwidth are also extremely important, as is making sure that any candidate CPU has support for the AVX2 instruction set.

- If it doesn't involve much of a trade-off in per-core performance (base clock speed, etc.), then I would expect that up to something like 16 physical CPU cores (32 logical) would scale well with minimal additional synchronisation overheads. (Numbers of cores beyond that may well perform well too, but we don't have feedback on that.)

- Hauptwerk's sample set cache loading mechanism (which determines how fast organs are loaded into memory) can currently take advantage of up to 6 CPU cores (but more won't hurt).

- The MIDI/relay event processing (organ switches, pipe on/off events, etc.) are necessarily serialised to ensure that the state of the organ relay remains consistent.

- Each of the background models (wind supply models, pipework modulation, tremulants) runs in its own thread (one thread per model). That's done since the wind model's time-slices need to be extremely short/frequent (sub-millisecond) and serialised. (It might possibly be be able to gain some performance in the future from further multi-threading the wind model within the time-slices, but possibly not, since the overheads of so many, extremely frequent, thread context switches might exceed any potential performance gain.)

- The first few logical cores are kept free of audio/convolution engine loads, and are instead used for the MIDI/relay, background models, and other threads. If there are 12 or more logical CPU cores then the first 4 logical cores will be reserved for those purposes.

- Hence per-core performance may well become a bottleneck for the achievable polyphony, even if you had a huge number of cores for the audio engine.

So in summary, to emphasize again: per-core performance (base clock speed, etc.) is still very, very important, even with a lot of cores.

vpo-organist wrote:Is AVX-512 also supported by Hauptwerk?


AVX-512 might possibly benefit convolution reverb performance (which relies on a third-part library for the DFFT function), but Hauptwerk (v6.0.2) won't currently benefit from it above AVX2 in other respects. (AVX2 is definitely worthwhile.)

(We did originally make a dedicated AVX-512 build for testing, but there seemed to be strange compatibility problems with at least one AVX-512 CPU, and it gave negligible performance/polyphony gain anyway, so we abandoned it for now, to be on the safe side. It might be resurrected in the longer-term if it proves beneficial.)

vpo-organist wrote:I currently only know the Intel Core i9-10980XE with AVX-512, Turbo 4.8 GHz, 18 cores/36 threads and costs around 1000 EUR. CPU benchmark 33989.

Or should it be a Ryzen 9 5900X, 4.8 GHz, 12Core/24Threads, AVX-2, Bench 39518 for 544 EUR?


Bob's reply covers that in great depth, and the only thing I would add is that Hauptwerk's convolution reverb engine might possibly perform better on Intel CPUs than AMD, all else being equal (same per-core performance, etc.), but in other respects Hauptwerk should be optimised equally for instruction sets (AVX2, etc.) on Intel vs. AMD CPUs.

...

As above:

- Hauptwerk's audio and convolution engines will certainly take advantage of more than 8 physical cores, but since per-core performance is also extremely important I would be cautious of going much beyond 8 cores if doing so also involves a significant compromise in per-core performance (otherwise, the more cores the better, within reason).
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline

Don_prince

Member

  • Posts: 8
  • Joined: Sat Oct 28, 2017 5:57 pm

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostTue May 16, 2023 6:33 pm

Hi Martin,

Thanks for the reply, I needed the server for something else and I thought, well for science lets try. I am aware of those threads you quoted :)

and I admit it runs better than my i9-11900k even on one of those 8880v4's,

So I tested it on Windows 11 enterprise, and I encountered worse performance and still only one CPU got used.

Next I disabled Hyper-Threading, and there it split across two cpu's uptill a max of 44 cores. This stroke me as interesting and leads me to believe the issue is with windows affinity. I will investigate if I can do something there.

It is an interesting platform to expiriment with and if you want to do some tests on it I will keep it available for that.

Currently it has no issues running two or more instances of hauptwerk using the alternate configs, so even without its already a ton of fun.

I will report after more testing, thanks!
Offline
User avatar

mdyde

Moderator

  • Posts: 15133
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostWed May 17, 2023 3:27 am

Thanks, Don. Glad to hear it performs well anyway.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline

Don_prince

Member

  • Posts: 8
  • Joined: Sat Oct 28, 2017 5:57 pm

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostWed May 17, 2023 7:10 pm

For anyone searching for the trick:
Download Process Lasso and in the dropdown menu, hit more and hit Processor Group Extender. This unlocks loads across NUMA on high core count systems.

I hit the theoretical limit of 64, it does indeed not go past it. So far have not found the polyphony cap, will investigate that.

Dudelange is now set to a limit of 20000 poly and does not hit the cap, I will try the same with Ezstergom.

I do not have access to a system with less cores with higher clockspeed but I am available for benchmarking. So far everything seems to work awsome.

Looking forward to Hauptwerk supporting more than 64 cores.

Keep up the development!
Offline

Don_prince

Member

  • Posts: 8
  • Joined: Sat Oct 28, 2017 5:57 pm

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostFri Aug 18, 2023 6:42 am

Hi Martin,

For these old CPU's

Does HW 8 bring that benefit of going past the 64 cores? (did that make it in?)

I read about increased poly performance, does that apply to these old CPU's aswell?

I am quite happy with the system working now, as I can run multiple instances of hauptwerk quite stable at once, and would hate to break that with HW8. But am absolutely willing to upgrade to HW8 if it functions better.
Offline
User avatar

mdyde

Moderator

  • Posts: 15133
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Hauptwerk across multiple NUMA nodes windows Server 2022

PostFri Aug 18, 2023 9:05 am

Hello Don,

Thanks for the interest in Hauptwerk v8.

Don_prince wrote:Does HW 8 bring that benefit of going past the 64 cores? (did that make it in?)


No -- the limit of 64 virtual cores is unchanged, I'm afraid. (All of the changes included are covered in the release notice: https://www.hauptwerk.com/documentation/ .)

Don_prince wrote:I read about increased poly performance, does that apply to these old CPU's aswell?


Also no -- Hauptwerk v8 can give significantly higher polyphony than previous versions specifically on CPUs that have cores with different (or varying) performances relative to each other, such as the 'performance' and 'efficiency' cores available in 12th-generation and later Intel CPUs. However, performance should be unchanged on CPUs whose cores are all equal, such as the older Intel Xeons that you have.

The only change that could conceivably affect polyphony on your Xeons is that there is a new general preferences "Bind audio engine threads to CPU cores on Windows?". If ticked, as it is by default, Hauptwerk will bind audio engine threads to CPU cores, as previous Hauptwerk versions did. If that preference was unticked, you still wouldn't get any additional audio engine threads, but Windows would be free to move the threads to any CPU cores it wanted. [If turning that preference off, having the "Try to run Hauptwerk at real-time priority on Windows" preference ticked (and launching it 'as Administrator') would usually be important otherwise audio glitches would be likely.]

However, turning that preference off almost certainly wouldn't benefit polyphony on your Xeons anyway, provided that HyperThreading is disabled (and keeping HyperThreading disabled should give best performance anyway, given that you have more than 64 physical cores.)

Hence I wouldn't expect you to get any polyphony gain from Hauptwerk v8 on your Xeons. There are, though, plenty of other enhancements in v8 (including faster organ loading/unloading), and v8 certainly shouldn't perform any worse for you than v7 in terms of polyphony.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.

Return to Technical support

Who is online

Users browsing this forum: No registered users and 3 guests