It is currently Thu Mar 28, 2024 3:02 am


Harnessing the power of many cores.

Buying or building computers for Hauptwerk, recommendations, troubleshooting computer hardware issues.
  • Author
  • Message
Offline

Antoni Scott

Member

  • Posts: 982
  • Joined: Fri Sep 24, 2004 5:18 pm

Harnessing the power of many cores.

PostSun Mar 05, 2017 10:00 am

This post is more in line with wishful thinking rather than reality but with all those computer experts in the Hauptwerk community maybe someone has also thought of the following.
I make no apology for my lack of computer skills or knowledge, but fortunately the Hauptwerk community, including Mr.Dyde and Milan, have helped me through those times when my computer was not working properly (which in the last nine years has been less times than I can count on one hand !!!). It was recommended in the very beginning that I use a MacPro (2008) and had a dual-quad core (8 cores) assembled for me.
A few years ago (2014) I was disappointed when Apple would not even look at my MacPro because it was "Vintage" . So after a month without my Hauptwerk, and suffering from severe Hauptwerk withdrawal symptoms, a Forum member gave me a link to a Mac web site for new spare parts. The tech told me that parts availability were plentiful and kindly walked me through a diagnostic process to zero in on the problem which turned out to be nothing more than one bad RAM module. The replacement cost of that module was nothing compared to a new 2014 MacPro !!!!!
I decided right away to get a back-up MacPro dual quad core exactly like mine which was inexpensive (who wants an old MacPro with a lousey 8 cores !!!!). The cost of the computer, without any RAM or hard drives was less than a power supply from Apple !!!! Now I could rest comfortably with the knowledge that I had a computer to swap out while fixing the other. The company I got the back up computer from (IBuildMacs) was extremely helpful and most of the internal parts were brand new (not the processors). I was relieved and impressed.

Then I decided to get another MacPro, this time with 12 cores (3.46ghz) and 96GB RAM. Overkill - maybe. But sample sets are getting bigger and bigger and this time I really wanted to future proof the computer. Since the old MacPro's are built like A1A Abrams tanks, and almost as heavy, it wasn't worth selling since the shipping would cost half the value of the computer. At this point I have three MacPro's, two with 8 cores each and one with 12 cores and a total of 150GB of RAM. Has someone out there ever considered harnessing the power of 28 cores ? Is it even possible ?
Antoni
Offline
User avatar

mdyde

Moderator

  • Posts: 15441
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Harnessing the power of many cores.

PostSun Mar 05, 2017 10:48 am

Hello Antoni,

Hauptwerk should theoretically run very well indeed on 28 cores, but they all need to be inside a single computer -- Hauptwerk doesn't have native functionality to run distributed across multiple computers. (Doing so properly would be complex, involving synchronising MIDI control, wind pressures, audio, handling/synchronising/routing audio between the computers, aiming to add minimal but constant latency, etc., and, even if we were one day to develop such functionality, in many cases it would still be likely to be cheaper just to buy a single, more powerful computer, especially if buying multiple audio and MIDI interfaces were to be required.)

12-core CPUs (for current PCs and Mac Pros) are readily available these days, and 'workstation'-type PCs are also available that can accept several 12-core (or even 16-core) CPUs (for 24+ cores in total).
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

nrorganist

Member

  • Posts: 64
  • Joined: Mon Sep 26, 2016 6:22 pm
  • Location: Northern Colorado, USA

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 12:47 am

Funny 28 cores are being discussed, albeit across multiple machines. I am about to test Hauptwerk on a workstation with 28 cores for real.

In the past year, I have run Hauptwerk on an i7 quad core with 32 GB memory and a 7200 rpm hard drive, accepting that sample sets would limited to this memory size and load times would be slower than SATA SSDs and much slower than NVMe SSDs.

Recently, I was able to obtain an HP z840 PC workstation with dual 14-core Xeon E5-2863 v3 processors with 35 MB cache each, minimal memory and a 7200 rpm hard drive. HP z-series workstations were attractive mostly due to generous max memory sizes - z440 max memory: 128 GB, z640 max memory: 256 GB with 2 CPUs, and z840 max memory: 1 TB with 2 CPUs. And partly due to z640 and z840 supporting dual processors with up to 22 cores each, with 2.5 MB cache per core.

I replaced the z840's memory with 4 x 32 GB DDR4-2400 ECC memory cards (leaving 12 open slots) and added a Samsung EVO 960 NVMe SSD (for the Hauptwerk cache) with an ASUS Hyper M.2 x4 (to PCIe) adapter (to connect to a PCIe x4 slot on the motherboard). Yes, maximizing the years of available capacity in as many dimensions as possible for Hauptwerk, and any other extreme computing needs in my future.

Initial Passmark benchmark runs on the components look promising: CPU and disk results (with the EVO 960) are in Passmark's 99th percentile. A bit less exciting - the St. Anne's Moseley sample set (with 1.1 GB cache file on the EVO 960) loads in less than 4.7 seconds (compared to over 10.5 seconds with all files on a 7200 rpm hard drive). During the EVO 960 load, 4 threads were active.

Loading larger sample sets will hopefully show greater speedup's when cache files are much larger on the EVO 960, compared with the other sample set files on the hard drive.

Also, it will be interesting to see how many threads are active (and to what percentages) during periods of greatest polyphony.

If there is interest in large sample set statistics on such a system, as I collect them, I would be happy to share.

Mark
Offline
User avatar

chr.schmitz

Member

  • Posts: 374
  • Joined: Sun Aug 25, 2013 11:49 pm

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 3:25 am

Very powerful machine !!! Wow! What about noise?

If you need even faster storage you could go for a SSD RAID. Some time ago I tested a system containing 4 SSDs, which achieved a read speed up to 5,000 MB/s.

Chris
Offline
User avatar

mdyde

Moderator

  • Posts: 15441
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 6:29 am

nrorganist wrote:Funny 28 cores are being discussed, albeit across multiple machines. I am about to test Hauptwerk on a workstation with 28 cores for real.

In the past year, I have run Hauptwerk on an i7 quad core with 32 GB memory and a 7200 rpm hard drive, accepting that sample sets would limited to this memory size and load times would be slower than SATA SSDs and much slower than NVMe SSDs.

Recently, I was able to obtain an HP z840 PC workstation with dual 14-core Xeon E5-2863 v3 processors with 35 MB cache each, minimal memory and a 7200 rpm hard drive. HP z-series workstations were attractive mostly due to generous max memory sizes - z440 max memory: 128 GB, z640 max memory: 256 GB with 2 CPUs, and z840 max memory: 1 TB with 2 CPUs. And partly due to z640 and z840 supporting dual processors with up to 22 cores each, with 2.5 MB cache per core.

I replaced the z840's memory with 4 x 32 GB DDR4-2400 ECC memory cards (leaving 12 open slots) and added a Samsung EVO 960 NVMe SSD (for the Hauptwerk cache) with an ASUS Hyper M.2 x4 (to PCIe) adapter (to connect to a PCIe x4 slot on the motherboard). Yes, maximizing the years of available capacity in as many dimensions as possible for Hauptwerk, and any other extreme computing needs in my future.

Initial Passmark benchmark runs on the components look promising: CPU and disk results (with the EVO 960) are in Passmark's 99th percentile. A bit less exciting - the St. Anne's Moseley sample set (with 1.1 GB cache file on the EVO 960) loads in less than 4.7 seconds (compared to over 10.5 seconds with all files on a 7200 rpm hard drive). During the EVO 960 load, 4 threads were active.

Loading larger sample sets will hopefully show greater speedup's when cache files are much larger on the EVO 960, compared with the other sample set files on the hard drive.

Also, it will be interesting to see how many threads are active (and to what percentages) during periods of greatest polyphony.

If there is interest in large sample set statistics on such a system, as I collect them, I would be happy to share.

Mark


Hello Mark,

Hauptwerk's organ loading mechanism currently is only able to take advantage of about 5-6 cores. With 6 or more cores available and a fast SSD, the loading speed is likely to be limited by the per-core speed, i.e. per-core CPU speed is likely to be the performance bottleneck. Looking at Intel's specifications for the Xeon E5-2683 v3, I see that the base clock speed is 2.0 GHz, with a max. turbo clock speed of 3.0 GHz:

https://ark.intel.com/products/81055/Intel-Xeon-Processor-E5-2683-v3-35M-Cache-2_00-GHz

... which is certainly not bad, but some of the 4-core and 6-core CPUs (e.g. some of the top end of the i7 range) have base clock speeds above 4 GHz, so they're likely to give considerably higher loading speeds (everything else being equal).

I would expect the huge core count of the Xeon really to come into its own in terms of polyphony, since Hauptwerk will distribute its audio engine across any number of cores.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

nrorganist

Member

  • Posts: 64
  • Joined: Mon Sep 26, 2016 6:22 pm
  • Location: Northern Colorado, USA

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 1:16 pm

Chris,

About noise - during system selection, it was definitely a concern. I started looking at servers before I realized that they only supported Windows Server operating system releases, not Windows 7/8/10 etc. client releases which Hauptwerk is supported on. Definitely wanted to avoid driver issues. Generally, servers noise levels turned out to be excessively high anyway (e.g. DL580 Gen9 noise is > 51 dB). Noise is probably a lower design priority due to servers typically being located in machine rooms away from the people that use them. Both issues steered me away from servers.

Looking at this workstation for example, z840 Quick Specs claim noise levels of:
    23 dB @ Idle
    26 dB with Hard Drive Operating (random reads)
    29 dB with DVD-ROM Operating (sequential reads)

On first power-on, I found the z840 quieter than my laptop. I had to look to confirm it was still on. The reason is documented in the z-Series Maintenance Service Guide - 8 fans:
    2 fans above and next to the processor heatsinks
    2 fans along the processor/memory "duct" channels oriented towards the back
    2 fans exiting at the top of the back panel
    2 fans next to the SAS/SATA hard drive tray stack
During physical inspection, actually I counted 10 fans (I have momentarily forgotten where the extra 2 were).

It appears to be designed to flow hot air out with many fans and slow moving blades, to minimize noise.

About RAID'ing for faster storage - ageed that it is definitely a great strategy! I remember running across Ernst's RAID'ing 3 7200 rpm hard drives for Hauptwerk cache (in lieu of SSD availability during his build) and thinking it is a great way to achieve more speed and extend the life of existing hardware/technology.

Since higher capacity NVMe SSDs currently are more inexpensive than multiples of lower capacity ones, for now I decided to try a larger one first, and see if there might be any bottlenecks through the PCIe interface, memory channels, L1-L3 caches, or the slower 2 GHz processor cores of the E5-2683 v3's (as Martin pointed out).

Could you describe the components of the system with 4 RAID'd SSDs reading a 5 GB/s and any bottlenecks you might have noticed?

Thanks,
Mark
Offline
User avatar

mdyde

Moderator

  • Posts: 15441
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 1:58 pm

One gentleman reported last year that he was getting an overall average data read rate in Hauptwerk of about 1100 MB/s (which I'd regard as exceptionally high) with two SSDs in RAID 0, and an i7-5820K CPU, if loading all ranks with memory compression disabled: viewtopic.php?f=4&t=14624#p113137 .
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline
User avatar

nrorganist

Member

  • Posts: 64
  • Joined: Mon Sep 26, 2016 6:22 pm
  • Location: Northern Colorado, USA

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 2:04 pm

Martin,

Thanks for providing feedback on Hauptwerk organ loading. Good to know that per-core speed very likely will be the next bottleneck.

A quick crude test I just did was to look at Task Manager:
    1) idle after Hauptwerk was started, prior to loading
    2) traces of Hauptwerk loading St. Anne's in less than 5 seconds
Idle shows:
    threads 1-54 at 0%
    threads 55-56 having minimal activity (0-5% with a few up to 12% peaks)
Loading (during the < 5 seconds of actual load) shows:
    threads 1, 3 with 10-20% peaks
    thread 5 with > 25% peak
    thread 13 with ~25% peak
    thread 21 with ~60% peak
    thread 29 with < 25% peaks
    thread 31 with < 15% peak
    thread 33 with ~30% peak
    thread 35 with ~40% peak
    thread 37 with ~70% peak
    thread 39 with ~50% peak
    thread 41 with ~80% peak
    thread 45 with ~25% peak
    thread 47 with ~10% peak
    thread 50 with ~20% peaks
    thread 52 with ~25% peaks
    thread 55 with ~22% peaks
    thread 56 with ~1% peak
I am happy to see only 3 threads with > 50% peaks. But I am surprised to see there were so many (16) threads active (other than threads 55-56 which I assume are processing base OS activity).

I double checked what Idle looked like after the load. Surely there were other non-Hauptwerk-load processes active? But the most thread activity I saw over several minutes of post-load Idle was 4 active threads:
    threads 51, 53, 55-56 having minimal activity (0-5%)
I am curious which more accurate CPU logging might be useful to identify the Hauptwerk load threads.

If interested, I could PM the screenshots.

Thanks,
Mark
Offline
User avatar

chr.schmitz

Member

  • Posts: 374
  • Joined: Sun Aug 25, 2013 11:49 pm

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 3:03 pm

Hello Mark,

the SSD-RAID was installed in a 5,1 Mac Pro. I returned this computer for other issues. Regarding the SSD-RAID you can find more information here: https://ibuildmacs.com/products/macpro- ... 1-12-core/ (Hard Drive / SSD - Bay 1 > more info).

Unfortunately I do not have more details.

Chris
Offline
User avatar

nrorganist

Member

  • Posts: 64
  • Joined: Mon Sep 26, 2016 6:22 pm
  • Location: Northern Colorado, USA

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 3:27 pm

Martin,

Thanks for referencing the viewtopic.php?f=4&t=14624#p113137 thread. Looking at the Hauptwerk log of my St. Anne's load is exactly what you previously described:

    Sample loader: pct. data loading time each stage busy: disk I/O: 21.47.
    Sample loader: pct. sample loading time stage busy: 1st proc.: 33.31.
    Sample loader: pct. sample loading time stage busy: 2nd proc.: 60.23.
    Sample loader: pct. sample loading time stage busy: 3rd proc.: 49.98.
    Sample loader: pct. sample loading time stage busy: 4th proc.: 96.14.
    Sample loader: pct. sample loading time stage busy: 5th proc.: 39.34.
As you said: The CPU *is* the bottleneck.

Moral of the story - don't trust Task Manager to be anywhere close to accurate. Probably more of an average over longer time intervals (e.g. each second?).

Thanks again,
Mark
Offline

murph

Member

  • Posts: 727
  • Joined: Fri Mar 02, 2012 5:45 pm

Re: Harnessing the power of many cores.

PostMon Mar 06, 2017 4:24 pm

Seeing as this has gone slightly off topic, I'll add my 2c.

I have a Z800, currently with 2 x X5675's (hex core, 3.05gHz/core). 96Gb ram. The cache is on 3 x 1G 7200 sata's in raid0.
Previously, there was E5520's (quad core, 2.26gHz) with 48gb ram. Polyphony was about 4400, when I really tried to push things (my main test is to turn everything on, all couplers (+subs/supers) and quickly do up/down glissandos 'till things break up. If no break-up, turn up polyphony and repeat.)
I quickly replaced the processors with E5645s. (hex core, 2.4gHz). Polyphony hit 5560.
With Goerlitz on the way, the ram went up to 96gb. (this actually ran cooler than the original!! Less chips/ranks per module. Helps fan speed, more of which later)
Last summer, I got the x5675 pair. Polyphony only went up to about 5800. (This does depend on the organ, lots of enclosures/on the fly voicing will reduce it to this number. Un-enclosed, un-voiced no compression increases it to about 6500. However, most of the sets I have with no boxes use tiny amounts of available power/polyphony, so that doesn't really count.)
Because of these figures, I am assuming the bottle-neck is the ram speed (1333mHz with 8gB modules).
The 3 disk raid array loads at about 840mb/s. I almost had gotten 4x 2TB 7200 sas drives last week (I think the seller realized they were worth more than €50 each new....) to try. Pity it didn't work out. A 4-disk raid 0 on the sas controller SHOULD load at about 1200mb/s, which is close enough to the ram limit.
The Z series have a useful feature in the BIOS, where the base fan speed can be set. BUT, as mentioned earlier, specific ram can cause heat generation to increase (a lot!!). A good rule of thumb is: less chips/ranks per module = less heat. Processors are what they are. Faster=hotter.
A good rule with lots of cores is: NO hyperthreading. Once you go over about 4 physical cores, with HW, the os spends more cycles allocating tasks to the processors then to the cores than the cores spend executing (hence single processors with faster cores will give higher scores than dual systems with lower speeds, even though the overall power is greater. Turbo is a no-no. (mucks with audio timing and maxes processor temp, which maxes fans.......)
Good luck with the 840 and keep us informed. I'll probably get one of them when 96gb ram is too little.
Offline
User avatar

mdyde

Moderator

  • Posts: 15441
  • Joined: Fri Mar 14, 2003 1:19 pm
  • Location: UK

Re: Harnessing the power of many cores.

PostTue Mar 07, 2017 5:14 am

[Topic moved here.]

nrorganist wrote:As you said: The CPU *is* the bottleneck.


Thanks, Mark.

Yes -- currently, with the fastest recent PCIe SSDs, and with more than 4 CPU cores, CPU will definitely be the bottleneck for loading speeds.
Best regards, Martin.
Hauptwerk software designer/developer, Milan Digital Audio.
Offline

MrNhanduc

Member

  • Posts: 51
  • Joined: Thu Feb 12, 2015 6:08 am
  • Location: The Netherlands

Re: Harnessing the power of many cores.

PostWed Mar 08, 2017 1:48 am

What Martin describes is on par with my own observations of building computers specifically for Hauptwerk users.

1) More cores gives you more polyphony (i'm not sure how well it scales though when the number of cores increases).
2) Higher clockspeeds give you better performance when loading samplesets
3) The more cores on a processor, the lower the clockspeeds. This is especially true for Xeon's, which is why I don't like them. I have never build a new computer using a Xeon, because it will be hard to get decent loading times.
4) PCI-E NVME SSD's like Samsung 960 EVO/PRO are limited by the processor. This means that even faster NVME SSD's or putting 2 NVME SSD's in RAID0 will likely not perform better in loading times (hence would be a waste of money). They are faster than SATA SSD's, which is not true for all NVME SSD's (Intel 600p :evil: )
5) If you want better sample loading times, have as much RAM as possible. If you can load uncompressed, this will help performance greatly (at least in some sets). I recently build a computer with 6-core processor and a NVME SSD and 128 GB of RAM (looks awesome by the way :D ). It boots and loads the Haarlem organ uncompressed (20-bit) in 1 minute and 40 seconds. After that time, you have loaded close to 100 GB of data to the RAM.

A six-core processor at high clockspeeds for me is the sweet spot. You can take full advantage of the 5 core sample loading mechanism at high clockspeeds. At the same time, you are not likely to run into polyphony limits. More expensive processors are, in my view, a waste of money. As are many core, low-clocked Xeons.

By the way, the launch of AMD's Ryzen processors could change things a little, since they offer affordable 8-core processors at still decent clockspeeds. For the future, when Hauptwerk (v5?) can take advantage of more cores than 5 when loading sets (I sincerely hope so for all end users!!), these 8-core processors might become a new sweet spot regarding price and unlimited performance.

I Hope this post is helpful for anyone who is considering a new computer.

(Sorry for my english, I'm dutch :wink: )
Offline

Timo

Member

  • Posts: 17
  • Joined: Thu Sep 14, 2006 4:17 pm
  • Location: Finland, Helsinki

Re: Harnessing the power of many cores.

PostWed Mar 08, 2017 4:52 pm

I've got these numbers:

Code: Select all
Sample loader: loaded from data cache: Y.
Sample loader: data cache total disk size: 21373.98 MB.
Sample loader: buffers: 9.
Sample loader: approx. loader peak mem. usage during audio loading: 134.14 MB.
Sample loader: loader def. mem. usage during audio loading: 131.47 MB.
Sample loader: approx. loader mem. usage during trem. loading: 22.57 MB.
Sample loader: approx. avg. overall data read rate: 646.36 MB/s.
Sample loader: approx. avg. data read rate during disk reader activity: 1183.37 MB/s.
Sample loader: pct. data loading time each stage busy: disk I/O: 54.62.
Sample loader: pct. sample loading time stage busy: 1st proc.: 59.33.
Sample loader: pct. sample loading time stage busy: 2nd proc.: 51.85.
Sample loader: pct. sample loading time stage busy: 3rd proc.: 46.93.
Sample loader: pct. sample loading time stage busy: 4th proc.: 90.44.
Sample loader: pct. sample loading time stage busy: 5th proc.: 63.54.
Sample loader: pct. thread activity due to stage: disk I/O: 14.89.
Sample loader: pct. thread activity due to stage: 1st proc.: 16.18.
Sample loader: pct. thread activity due to stage: 2nd proc.: 14.14.
Sample loader: pct. thread activity due to stage: 3rd proc.: 12.80.
Sample loader: pct. thread activity due to stage: 4th proc.: 24.66.
Sample loader: pct. thread activity due to stage: 5th proc.: 17.33.

All this with single SSD unit (Samsung 950 PRO connected to m.2 on motherboard). No RAID. Samsung 960 is supposed to be even faster.

Timo.

Return to Computer hardware / specs

Who is online

Users browsing this forum: No registered users and 6 guests