{{Quickfixn}} QF/n Threads and Locks

Wed Nov 26 06:31:25 PST 2014

Comment from gbirchmeier 05 Nov, after making a few observations about
current QF/n code:

> The need for backward compatibility is starting to feel like a
straightjacket, and our velocity is zero.

> It'd be refreshing if we could be a little reckless for a while.

I'm going to throw in some reckless, compatibility-breaking ideas here, to
do with locks and threads, inspired in part by the concurrency issues
(performance problems, and also out-of-order sending caused by a race
condition) others have been raising. Any feedback is, of course, welcome.

Starting with locks, QF/n has a fair few of them: 7, by my count. Once you
get to certain level of locking complexity, it can be difficult to be sure
you don't have lurking race conditions, waiting for their chance to manifest
as hard-to-diagnose runtime issues. Deadlocks too, though I'm not aware that
any have been found in QF/n(?). Locking activity in QF/n also imposes
unnecessary serialisation: in particular, there is one global lock
(Session.sessions_) that every receive and every send operation on every
connection competes for. It's true that that this lock is held only very
briefly, but it ought not to be necessary to synchronise across connections
like that. (Actually, in the case of the aforementioned lock, I'm not sure
that it *is* necessary, even with the current code structure, as I don't
think that the collection it protects changes at all after
initialisation(?))

And while it should be possible to have receive and send operations on a
connection happen in parallel, with QF/n there's enforced serialisation
caused by contention for the session-level locks (Session.sync_,
SessionState.sync_). It's this contention that led to someone on here
complaining a short while ago about the performance hit of holding the
session lock over a blocking send operation. An alternative approach would
be to make all connection-related activity asynchronous, including sending,
receiving and even connecting. And if a timeout is needed, for heartbeat
generation for instance, then a timer callback could be used. The essential
thing here is to make all the code *non-blocking*, so you have no
synchronous I/O or synchronous timing calls at all. You can then get rid of
many of your locks by serialising *all* activity on a particular connection,
so that, for example, an attempt to initiate a send will never pre-empt
receive completion processing, and one attempt to initiate a send will never
pre-empt another. Yes, doing this means that an attempt to initiate a send
could queue up behind receive processing, or vice-versa, but because
everything is now non-blocking, any queueing delays should be unlikely and
short-lived, and because you are treating request initiation and request
completion as separate operations, you can quite happily interleave send and
receive activity.

Currently, QF/n has a thread per connection, which it uses to perform
blocking socket reads and various housekeeping functions. You could maintain
a per-connection thread to enforce the serialisation described above, but if
you have, say, 500 active connections, then having 500 threads won't buy you
anything in performance terms over having a number of threads equal to the
number of physical/hyper-threaded cores you have. So there's an argument for
using the thread pool, and letting .NET do its tuning thing to optimise the
number of pool threads for performance. It's trivial to write a scheduler
that allocates work out to pool threads, but at the same time, ensures that
activity on a particular connection is serialised: if you go with TPL, you
can do it in a few lines of code by subclassing TaskScheduler, but even
without TPL, it's easy to do. You do need a lock in your scheduler, for
enqueueing and dequeueing tasks, but it's a just a per-connection lock, not
a global one, and this is arguably the best place to do your locking,
abstracted away from your actual socket-related activity.

So what's the impact on the App interface? This is probably a separate
discussion, but for best performance, App code should embrace the
non-blocking paradigm, and do its stuff (logging, storing, retrieving for
resend, processing received messages, etc.) using asynchronous I/O (or
delegation to another thread if blocking can't be avoided) where relevant:
with TPL, you could expose the subclassed TaskScheduler for use by the App
code to facilitate this. On the Send side, the sending operation becomes
asynchronous, but if this proved unpopular, it'd be simple enough to provide
an optional synchronous Send facade, with the marshalling of the send
operation onto the thread pool (to ensure serialisation with read and other
operations on the connection) taking place under the hood. Once again, just
a couple of lines of code with TPL, but still easy enough to do without TPL.
(Note that while the App thread that called the synchronous Send would
block, the actual send on a pool thread would still be asynchronous,
allowing receive processing to continue on the connection.)

As for using TPL - there's no doubt it would cut down on the spade-work
needed, but its use wouldn't be mandatory. Also, I think there'd be more Gen
1 garbage with TPL, as Tasks are a one-shot thing, and so you'd be creating
one every time you wanted to marshal an asynchronous operation onto the
thread pool. The marshalling itself would also imply more context-switching,
but then, the number of threads would be going down. (Some very quick
testing suggests that with non-blocking operations on pool threads, the
number of pool threads maxes out at or slightly above the number of cores,
and that piling on work at a rate faster than 100% on all cores can handle,
just results in the extra work being queued, rather than the number of
threads being increased, which makes sense.)

Incorporating serialisation of operations on a connection into your
scheduler obviates the need for session/connection level locks elsewhere in
the code, but there are also locks that are more global in scope, such as
the one protecting a collection of all active/pending connections, a
collection which get might be referenced by code across multiple
connections. There are (safe) tricks here too that can minimise locking, and
there's much else that could be said as well, but that's probably enough
rambling on for now, and of course, there's little point in contemplating
big changes if the current threading/locking structure is sufficient (or at
least, can be made so) in terms of performance and reliability.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quickfixn.com/pipermail/quickfixn-quickfixn.com/attachments/20141126/3319886b/attachment-0001.htm>