Added a further 15% to AFIO's performance :) I noticed that the standard spinlock was being stupid and was looping compare-exchange which of course sends a ton load of cache line invalidations to all other CPUs. Adding a simple read check to my custom spin lock implementation before the compare-exchange eliminates those cache line invalidations, and voila up goes performance again. My new custom spin lock class is very nice, you compose its behaviour out of templated policy types and it is written using C++ 11 atomics. Check out the link if you're interested.
While I was at it, I replaced all spin lock usage in AFIO with memory transactions :) I don't have the necessary support on this CPU, but in theory it ought to work and might add another 20-40% to performance. I also wrote up an emulation of memory transactions using my new spin lock, so for older CPUs like mine it all works nicely.
I'm up to 1.5m ops/sec now. Fast enough for mid range PCIe based SSDs like the ones hedge funds use. I'm happy with performance now, so just some Linux testing and verification to do, backporting all this lovely new code back to older compilers, and I can push to master.
Anyway, bed time now. It's 5am, and I have jobs to apply for tomorrow. Sigh.