Tuning VM Performance on XCP-NG for Dell 11G Servers

There’s a great white paper by Dell on BIOS settings for optimal performance on 11G servers (definitely worth a read if this post is relevant to you), but it doesn’t tell the whole story. The thorough testing in the paper was performed with regard to High Performance Computing applications, which is a good indicator of which settings to choose, but if you’re running a virtualisation workload there’s more to consider.

Detailed here are the BIOS settings which gave me the best VM performance with the XCP-NG hypervisor. I’ll break it down into the sections seen in the BIOS. Keep in mind this is for a server which is considerably over a decade old, and may not align with current best practices. If in doubt, do your own testing specific to your workload. It’s a pain in the ass to do, but worth it.

Memory Settings

There’s only one setting here that affects performance (only applicable for dual+ CPU systems).

Node Interleaving - Disabled

As far as I understand, leaving node interleaving disabled presents the OS with the underlying NUMA architecture, leaving the OS in charge of allocating memory to the best location. Setting this to enabled presents a UMA architecture which can increase memory latency, but is apparently is useful when an application uses more memory than is available to one CPU locally. I haven’t benchmarked this, but basically no one on the internet said that there’s a benefit to enabling it. Further testing to come so I can confirm this, if I can be bothered.

Processor Settings

There are a few things here which helped with performace:

Logical Processor - Enabled
Adjacent Cache Line Prefetch - Disabled
Hardware Prefetcher - Disabled
Data Reuse - Disabled
Intel QPI Bandwidth Priority - Compute (dual+ CPU systems)
Turbo Mode - Enabled
C1E - Enabled
C States - Enabled

Logical Processor = hyperthreading. You need as many cores as you can get for virtualisation and disabling it doesn’t make the individual cores more powerful. So you just end up with less cores and worse performance.

Adjacent Cache Line Prefetch, Hardware Prefetcher and Data Reuse are designed to make the CPU caches more efficient by either prefetching the next line of code or keeping frequently used lines in the cache. In practice, the white paper found it had no measurable impact, but recommended to keep in enabled just in case, since it didn’t use noticeably more power. I found it had no measurable performance impact either, but I did find it increased network latency by ~2μs, so I’m disabling it. It might afford some energy savings too. It makes sense to me that since virtualisation runs many different workloads, the next line may not be sequential or what the algorithm expects, so it ends up having to be fetched anyway – so there’s not much point in prefetching stuff it doesn’t need that just causes congestion. That’s my thought process at least. This was also the first implementation of the technology, so perhaps the prefetch algorithms have improved since, but in this case keep it disabled.

Bandwidth priority – if in doubt leave this on compute. Contrary to what you might think, setting this to I/O actually hurt virtualised network performance. I don’t know why or how this works, but it’s something about the link between CPUs giving priority to different types of traffic. I don’t know how it determines which traffic is which, but since the I/O is virtualised it probably counts it as computational traffic (I would think).

Turbo mode – free performance, leave it enabled unless you want power savings.

C1E and C States – provides up to 30% idle power savings and also (apparently) increases the chance of the CPU reaching turbo speeds. It does increase latency and consistency of latency, so leave it disabled for high performance applications, but in my case it wasn’t worth the increase in power consumption.