Zarathustra[H]
Extremely [H]
- Joined
- Oct 29, 2000
- Messages
- 39,093
Hey everyone,
I figured I'd post about this as I have been having some issues myself.
For the longest time I've been running a Kaby Lake Core i3-7100 (2C/4T, 3.9Ghz Base, No Turbo) in a bare metal router install, and it worked very well.
I ran OpenVPN directly on the router, and I was able to reach upwards of 700Mbit/s through OpenVPN while only loading the CPU to 10-12% while doing so.
I wanted to move towards more enterprise-like hardware, so I got a good deal on a Rocket Lake Xeon E-2314 (4C/4T, 2.8Ghz Base, 4.5Ghz Turbo)
Because I knew this hardware to be seriously overkill for my routing needs, I decided to install Proxmox on it, make it a part of a cluster with my main virtualized server. I did this by using IOMMU to pass through both the WAN and LAN NIC's directly to the guest, so I could still benefit from CPU offloading, and didn't have to expose the host to the WAN for security.
I also took the opportunity while I was at it to transition from pfSense to OPNsense, and since I had to set everything up again, I decided to go with WireGuard instead of OpenVPN.
Everyone keeps talking about how fast and lightweight WireGuard is, so I figured it would be a huge improvement.
Oh boy, was I wrong.
We are talking MASSIVE CPU load increases.
I did some reading through the OPNsense performance tweaking guide, and found that it applies mitigations for Spectre & Meltdown by default regardless of which CPU is installed. I disabled these (as Rocket Lake is unaffected) and while it helped a little, we are still dealing with massive CPU load when loading up WireGuard.
I even upped my initial 2 core assignment to the guest to 3 cores, and it still results in crazy high load.
Where OpenVPN on pfSense never pushed it beyond at most 12%, now with three cores of a CPU that is much faster I am hitting initial loads of over 90% on the CPU when I start a speed test, which then settles down to about 50-75% CPU load for the remainder of the run.
A quick back of the envelope calculation suggests that since each core of the Rocket Lake Xeon is ~20% faster than a Kaby Lake Core i3-7100 core, and there are 3 of them instead of 2, and since I am going from 12% to 90% CPU load, I have seen an overall cpu load increase of 13.5 times in going from OpenVPN on pfsense baremetal, to WireGuard on OPNSense under Proxmox/KVM.
That is insane.
A couple of notes on that though:
1.) OpenVPN got me to 650-700Mbit/s most of the time. WireGuard gets me to ~880Mbit/s so there is a throughput increase to factor in here, but only ~25% - ~35%, not nearly enough to explain the massive CPU load increase.
2.) There are obviously going to be some overhead/efficiency losses by running virtualized vs bare metal, but it should be that much.
Theories as to the cause thus far:
1.) KVM Configuration issues (I've gone through them, passing host CPU through to guest to make sure it sees all CPU features, etc. I'm no beginner to virtualization, but I guess I could have missed something)
2.) Rocket Lake (despite being launched in 2021) not being fully recognized/supported in the FreeBSD kernel yet?
3.) OPNSense performance optimization. The OPNSense is obviously very conservative when it comes to security at the expense of performance. I've already disabled Spectre/Meltsdown mitigations, but I wonder what else could be tweaked.
4.) Hardware acceleration. OpenVPN used AES ciphers and was utilizing AES-NI acceleration on the CPU. WireGuard apparently uses some strange cipher called ChaCha-Poly1305 for which there is no hardware acceleration as of yet*. Is this just the result of crunching the ciphers in software instead of using AES-NI?
*There is one exception. This patch (sketchy) reportedly allows Intel QAT to at least partially accelerate ChaCha-Poly1305, but I don't have a QAT card...
https://patches.dpdk.org/project/dpdk/patch/[email protected]/
There is also some talk of AVX-512 potentially speeding up ChaCha-1305, but that appears to be a future oriented conversation.
So, I adopted WireGuard because everyone keeps saying how fast it is, but my reigning theory right now is that it is only fast if you compare software use of OpenVPN/AES vs WireGuard/ChaCha-Poly1305. If you system has AES-NI hardware acceleration (which just about every x86 CPU released in the last 10-15 years does) OpenVPN with AES is going to be overwhelmingly lighter on the CPU.
It makes you wonder why anyone uses WireGuard at all...
Is that a reasonable assessment?
Appreciate any input and/or thoughts.
I figured I'd post about this as I have been having some issues myself.
For the longest time I've been running a Kaby Lake Core i3-7100 (2C/4T, 3.9Ghz Base, No Turbo) in a bare metal router install, and it worked very well.
I ran OpenVPN directly on the router, and I was able to reach upwards of 700Mbit/s through OpenVPN while only loading the CPU to 10-12% while doing so.
I wanted to move towards more enterprise-like hardware, so I got a good deal on a Rocket Lake Xeon E-2314 (4C/4T, 2.8Ghz Base, 4.5Ghz Turbo)
Because I knew this hardware to be seriously overkill for my routing needs, I decided to install Proxmox on it, make it a part of a cluster with my main virtualized server. I did this by using IOMMU to pass through both the WAN and LAN NIC's directly to the guest, so I could still benefit from CPU offloading, and didn't have to expose the host to the WAN for security.
I also took the opportunity while I was at it to transition from pfSense to OPNsense, and since I had to set everything up again, I decided to go with WireGuard instead of OpenVPN.
Everyone keeps talking about how fast and lightweight WireGuard is, so I figured it would be a huge improvement.
Oh boy, was I wrong.
We are talking MASSIVE CPU load increases.
I did some reading through the OPNsense performance tweaking guide, and found that it applies mitigations for Spectre & Meltdown by default regardless of which CPU is installed. I disabled these (as Rocket Lake is unaffected) and while it helped a little, we are still dealing with massive CPU load when loading up WireGuard.
I even upped my initial 2 core assignment to the guest to 3 cores, and it still results in crazy high load.
Where OpenVPN on pfSense never pushed it beyond at most 12%, now with three cores of a CPU that is much faster I am hitting initial loads of over 90% on the CPU when I start a speed test, which then settles down to about 50-75% CPU load for the remainder of the run.
A quick back of the envelope calculation suggests that since each core of the Rocket Lake Xeon is ~20% faster than a Kaby Lake Core i3-7100 core, and there are 3 of them instead of 2, and since I am going from 12% to 90% CPU load, I have seen an overall cpu load increase of 13.5 times in going from OpenVPN on pfsense baremetal, to WireGuard on OPNSense under Proxmox/KVM.
That is insane.
A couple of notes on that though:
1.) OpenVPN got me to 650-700Mbit/s most of the time. WireGuard gets me to ~880Mbit/s so there is a throughput increase to factor in here, but only ~25% - ~35%, not nearly enough to explain the massive CPU load increase.
2.) There are obviously going to be some overhead/efficiency losses by running virtualized vs bare metal, but it should be that much.
Theories as to the cause thus far:
1.) KVM Configuration issues (I've gone through them, passing host CPU through to guest to make sure it sees all CPU features, etc. I'm no beginner to virtualization, but I guess I could have missed something)
2.) Rocket Lake (despite being launched in 2021) not being fully recognized/supported in the FreeBSD kernel yet?
3.) OPNSense performance optimization. The OPNSense is obviously very conservative when it comes to security at the expense of performance. I've already disabled Spectre/Meltsdown mitigations, but I wonder what else could be tweaked.
4.) Hardware acceleration. OpenVPN used AES ciphers and was utilizing AES-NI acceleration on the CPU. WireGuard apparently uses some strange cipher called ChaCha-Poly1305 for which there is no hardware acceleration as of yet*. Is this just the result of crunching the ciphers in software instead of using AES-NI?
*There is one exception. This patch (sketchy) reportedly allows Intel QAT to at least partially accelerate ChaCha-Poly1305, but I don't have a QAT card...
https://patches.dpdk.org/project/dpdk/patch/[email protected]/
There is also some talk of AVX-512 potentially speeding up ChaCha-1305, but that appears to be a future oriented conversation.
So, I adopted WireGuard because everyone keeps saying how fast it is, but my reigning theory right now is that it is only fast if you compare software use of OpenVPN/AES vs WireGuard/ChaCha-Poly1305. If you system has AES-NI hardware acceleration (which just about every x86 CPU released in the last 10-15 years does) OpenVPN with AES is going to be overwhelmingly lighter on the CPU.
It makes you wonder why anyone uses WireGuard at all...
Is that a reasonable assessment?
Appreciate any input and/or thoughts.
Last edited: