DragonFly kernel List (threaded) for 2008-04
Re: altq spinlock and if_start dispatch
:In an experiment I conducted ~one weeks ago, I found out that ifnet
:serializer contention (if_output and NIC's txeof) had negative effect
:on network forwarding performance, so I created the following patch:
:I considered dispatching outgoing mbuf to NIC's interrupt/polling CPU
:to do the enqueue and if_start thus to avoid the spinlock, but upper
:layer, like TCP, processes ifq_handoff error (e.g. ENOBUFS);
:dispatching outgoing mbuf will break original semantics. However, the
:only source of error in ifq_handoff() is from ifq_enqueue(), i.e. only
:ifq_enqueue() must be called directly on the output path. Spinlock is
:chosen to protect ifnet.if_snd, so ifq_enqueue() and ifnet.if_start()
:could be departed. '1)' has one implication that
:ifq_poll->driver_encap->ifq_dequeue does not work, but I think driver
:could be easily converted to do ifq_dequeue+driver_encap without
:packet lossing. '1)' is the precondition to make '2)' and '3)' work.
:'2)' and '3)' together could avoid ifnet serializer contention. '4)'
:is added, since my another experiment shows that serious ipiq overflow
:could have very bad impact on overall performance.
:I have gathered following data before and after the patch:
:boxa --+----> em0 -Routing box- em1 ----> msk0
:boxa (2 x PIII 1G) and boxb (AthlonXP 2100+) each has one 32bit PCI
:Fast forwarding is improved a lot in BGL case, probably because the
:time consumed on input path is greatly reduced by if_start
:This patch does introduce regression in MP safe case when em0/em1 is
:on different CPU than the packet's target CPU on Routing box, may be
:caused by ipi bouncing between two CPUs (I didn't find the source of
:the problem yet).
:Fast forwarding perf drops a lot in MP safe case, if em0 and em1 are
:on different CPUs; reducing serializer contention does help (~40Kpps
:improvement). Something still needs to be figured out.
:So please review the patch. It is not finished yet, but the major
:part has been done and I want to call for reviewing before I strand
:too far away.
:Live Free or Die
Ok, I think one of the biggest issues is the MP lock. I'm assuming
you were running with kern.intr_mpsafe set to 1 for these tests?
The way to determine whether the MP lock is creating a problem or
not is to use the KTR logging facility and log all lock contention
(and maybe even add additional KTR facilities to track exactly what
the packet path is doing.
None of the tests are going to give us real results unless the MP
lock is completely out of the test path. That should be the #1
issue before we start splitting the serializer locks. I suspect
some of your IPI bouncing issues may be due to the MP lock.
:The ideas behind this patch are:
:1) altq is protected by its own spinlock instead of ifnet's serializer.
I like the idea.
:2) ifnet's serializer is pushed down into each ifnet.if_output
:implementation, i.e. if_output is called without ifnet's serializer
I agree that having just one serializer for a network device driver
is not enough. Ultimately I think we do want two or three
serializers: One for the interrupt (RX packet handling and TX
packet cleanup after xmit has completed), one for transmit packet
queueing, and possibly one for the control path (ifconfig etc).
This last one doesn't really need to exist, the kernel could
instead acquire both of the other two serializers when doing control
So, yes, the TX packet queue to interface mechanic would definitely
have its own serializer.
But I think the only way to do this cleanly is to have the callers
into the API acquire the appropriate serializers instead of forcing
each individual network driver to acquire them. Remember how easy it
was for me to implement the original serializer plan? I want things
to continue to be that easy.
:3) ifnet.if_start is dispatched to the CPU where NIC's interrupt is
:handled or where polling(4) is going to happen
This shouldn't be necessary. This sounds like an issue that the MP
lock might have been effecting. Acquiring the TX packet interface
enqueue serializer ought to be enough and then simply directly call
the entry point (and also using your if_start trick below).
:4) ifnet.if_start's ipi sending is avoided as much as possible, using
:the same mechanism as avoiding real ipi sending in ipiq implementation
I like this part of the patch. Definitely a good idea.