DragonFly commits List (threaded) for 2005-07
cvs commit: src/sys/kern lwkt_ipiq.c
dillon 2005/07/23 00:17:42 PDT
DragonFly src repository
Fix a critical bug in the IPI messaging code, effecting SMP systems. In
order to avoid placing a load fence in the FIFO scanning loop the FIFO write
index is cached and the code then loops the read index until it reaches
the cached write index, rather then real-time write index. However, if a
FIFO full condition occurs during the callback AND additional IPIQ messages
are queued to the current cpu by a remote cpu at the same time, a recursive
call to lwkt_process_ipiq*() can occur and advance the read index past
the cached write index of the parent processing loop.
An exact comparison against the cached write index was being used which
resulted in the parent processing loop blowing past the actual write
index and re-executing stale IPI messages. Fix the comparison.
The nature of this bug combined with other bugs in the token code and the
sockbuf code (which were causing crashes far more often) made this a
particularly nasty problem to find, with it taking upwards of a week to
generate a crash and the crash occuring at the worst place imagineable
(a hard IPI interrupt) and doing terrible things (re-executing a stale
IPI message). It took KTR logging on both the sending and receiving
side of the IPI code to nail the problem.
Very special thanks to Peter Avalos and David rhodus for their debugging
help. And, most especially, David Rhodus for helping track this down over
the last *THREE* months.
Reported-by: David Rhodus, Peter Avalos, YONETANI Tomokazu, Tomaz Borstnar
Special-thanks-to: David Rhodus and Peter Avalos
Revision Changes Path
1.15 +5 -1 src/sys/kern/lwkt_ipiq.c