DragonFly users List (threaded) for 2006-11
DragonFly BSD
DragonFly users List (threaded) for 2006-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: dual port EM nic wedging under load

From: "Sepherosa Ziehau" <sepherosa@xxxxxxxxx>
Date: Sun, 26 Nov 2006 17:03:48 +0800

On 11/26/06, Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
:After staring at the extremely low interrupt rate, I think something
:should be wrong with interrupt processing in our code.  The loop
:mainly handles thingy that happens when we processing TX/RX desc
:rings.  I think it is necessary when RX overrun happens, the extra ICR
:reading/processing may restore the RX engine, I don't think the extra
:ICR reading will hurt if ICR has nothing left
:mmm, I forgot to take polling failure into account, polling does not
:even touch ICR ...
:The above patch is updated a little bit:
:Best Regards,

    I don't know.  I don't think this addressing the reason why polling
    failed to work after the overrun occured.  That is, we don't know why
    polling failed to work.

    I can think of two possibilities.  First, the EM device is skipping an
    entry in the receive ring (E1000_RXD_STAT_DD is not getting set), and
    that is stopping all receive ring processing cold.  Second, that when
    a receiver overrun occurs the receive ring processes all the packets
    but there is a bug in the handling of the ring index that confuses the
    firmware into thinking that we did not clear all the ring buffers when
    we did, for the case where the entire ring was full and the entire ring
    is then cleared.

My suspicion, because the polling stops working, is the second case.

    RDH and RDT (receiver head and tail descriptor pointers) are range
    inclusive.  On initialization the head is set to 0 and the tail is
    set to num_rx_desc - 1.  When we update it during receive packet
    processing we set it to the index of the last processed index (which
    is i - 1 because the index is advanced one past the last processed index).


    I think your patch may have a problem... if we do not process *ANY*
    receive frames in the loop your patch will end up adjusting RDT anyway,
    to an incorrect value.  It will set it to (the original)
    next_rx_desc_to_check - 1.  Oops!

    The question here is what happens when a receiver overrun occurs?  Clearly
    when that case occurs ALL the receive frames will be full.  Lets look at
    a degenerate case:

    [0 ...................... N-1]
    * RDH set to 0
    * RDT set to N-1
    * N frames come in  RDH is set to N-1 (??)
    * We process N frames
    * The frame at RING[N-1] is cleaned up
    * i = N
    * We set RDT to i-1 == N-1.  It's the same value it was set to before
      we processed all N frames.  The receive engine will think that the
      ring is still full when it is empty.

    I Think what we need to do here is set RDT to N-2 (mod N of course) in
    the case where we have processed ALL N frames.  I'll bet the firmware
    is getting confused in the overrun case because we are setting RDT to
    the same value it was set at before.  Very confused.

After staring at the rx processing code for a long time, I think I found the problem: in em_process_receive_interrupts . .. 2936: if (em_get_buf(i, adapter, NULL, MB_DONTWAIT) == ENOBUFS) { 2937: adapter->dropped_pkts++; 2938: em_get_buf(i, adapter, mp, MB_DONTWAIT); 2939: if (adapter->fmp != NULL) 2940: m_freem(adapter->fmp); 2941: adapter->fmp = NULL; 2942: adapter->lmp = NULL; 2943: break; 2944: } . ..

We will go into this condition when m_getcl(MB_DONTWAIT).  if this
happened, then
1) we skipped the rest of the RX ring processing
2) next_rx_desc_to_check was not updated
3) RDT was not updated

RX engine would be sitting there, faced with an almost full rx ring
after the interrupt processing.  This should lead to RX overrun and I
guess hardware may behave strange when this kind of things happened
(e.g. stall RX engine, which in turn stalls interrupts x_X)

As the output of hw.em0.debug_info=1, reported by Mike, the above
condition _did_ entered ("em0: Std mbuf cluster failed = 2", it is
adapter->mbuf_cluster_failed, which is updated if
em_get_buf():m_getcl() failed) during his benching.

Please review/test following patch:

Best Regards,

Live Free or Die

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]