Sunday 29th September 2013: 7.03pm. Bleh I'm feeling rough today. Last night whilst watching TV with Megan I had this crazy idea of how to make proposed Boost.AFIO - yes I know it's supposed to be finished as of last Monday - lock free, even wait free apart from the pesky std::unordered_map<> which would need locks around it. And it sufficiently excited me that I just had to get to work on it right now i.e. straight after Megan went to bed. Cue hence a late night refactoring session as I ripped out and replaced the core engine ... I feel today like I've just come out of a two day rave, elated but emotionally spent.
One reason I'd been holding back on sending AFIO in for peer review is because its performance was not great, and it was a bit embarrassing. Where ASIO - the thing AFIO is based upon - might be able to push 2.8m ops/sec, AFIO was doing about 140k/sec, enough for a SSD, but not great. That bothered me, so last few days after I spent my day job trying to force myself to apply for jobs in Europe I was tinkering with various strategies to improve the situation.
My first round involved refactoring the design such that I could remove the recursive mutex in AFIO's core, and that doubled performance. For my second round, I noted from profiling that most of the time was going on waiting on the mutex during ops completion handling, so I added a single producer single consumer thread-local queue which had completion handling push wait free into its own queue, while a dedicated thread consumed and executed all the completions. Performance dropped by 10%. Arse.
I had been thinking that was as good as it could get until I had that brainstorm last night. The essence of that brainstorm was to replace the hash table of ops with a hash table of shared pointers to ops, such that each thread could hold open op state until it was finished with it, and therefore allowing me to remove all locking and serialisation completely (apart from when touching the hash table).
As much as that really shouldn't be faster with all those millions of extra atomic instructions being executed, and indeed in the single threaded case it isn't, it scales with extra CPUs like crazy. AFIO can now push 1.4m ops/sec, or about half what ASIO can do which is bananas crazy given how much extra work AFIO does. And I am personally very, very pleased with half the performance of ASIO - AFIO is no longer embarrassing at that sort of speed. And so I can now release it for peer review without grimacing internally.
It's that sort of stuff which tech interviews for jobs have a very hard time in capturing. There is no way, even for an international expert in the field, to know that throwing lots of big heavy shared pointers at a scalability problem is the optimal empirical solution to a problem presented in a 45 minute long interview. If you had a few hours, and enough time to trace out memory dependency graphs in order to take some probability estimates of cache line contention rates, then maybe said international expert could do so. But would it be showing that international expert in their native field of practice? Of course not! Experienced people, especially really experienced people, let their subconscious mind do the dependency graphing and optimisation. Their conscious mind is pretty much just an avatar whose sole job it is to keep the subconscious mind fed and rested :)
I'm not an international expert in that area, and hence it did take me four days and one aborted attempt to come up with the solution I did. But my point is that tech interviews reward those recently graduated from a theory based environment, and by the time you get to my age and experience level I no longer think in terms of pure theory - in fact, the key to my eureka moment yesterday was thinking in terms of cache lines and throwing away all notion of algorithmic complexity. Doing that in a Google interview would mean getting instantly booted out the door at lunch unless you were extraordinarily lucky with your interviewer, because they'd read that as you not answering the question they asked because you don't know, and you're trying to futz the situation.
Which is fair enough - the skills for passing tech interviews don't have much in common with the skills for software engineering, and that fact is very well known (the problem being what would be better?). And an advantage, I suppose, of the algorithm based tech interview is a bias towards the young and recently graduated, and without that bias many of them would struggle to ever get onto the ladder.
Still, I guess where I'm going with this is this: I'm not really a software engineer any more. I've turned into a complex systems engineer who just happens to do software from time to time. And maybe fully realising what I am now might be very useful in getting me the right kind of job.
#gsoc2013 #boostcpp #boostafio #cpp11 #asio