DragonFly bugs List (threaded) for 2007-05
Re: kernel panic
On Fri, May 11, 2007 at 12:49:20PM -0700, Matthew Dillon wrote:
> :The strange thing is I was rebooting my laptop (via icewm) when this
> :occurred. The interface is re(4) according to the kernel buffer output
> :which follows.
> I'm guessing there's an issue with re_init() or re_stop() that is
> possibly being triggered by setting the IP address.
> re_init() for the RE interface looks like is doing some dangerous
> things... if there is DMA still operating while it is trying to
> reinitialize the device, that could be causing the NMI. It seems to be
> writing 0x00 to the command register which I guess is supposed to stop
> device operation, but it is not waiting for the device to actually stop
> operating before it begins to free the TX and RX rings.
> Most network controllers these days are actually microcontrollers,
> which means that commands do not instantaniously take effect when
> you write to the command register. Usually only the interrupt
> control registers are hardwired.
> I got two questions. First, when you ifconfig the interface with a
> new IP address does it normally pause before returning? That would
> indicate that is is in fact doing a full device reset when configuring
> an IP address. Second, can you reproduce the problem? Perhaps by
> re-configuring the device's IP address over and over again in a loop?
There is a small delay <2s. I ran a loop that switched between two
IPs for about 15 minutes and nothing happened.
The kernel buffer output in the corefile was from months ago. I only
remembered because I did the same thing this time; shutdown now;
umount /home; ifconfig re0 ... I don't know how this can be in a dump
months after the fact unless there is stale data in my swap partition
from my last coredump that hasn't been overwritten since I don't do
very much swapping. This idea may be completely wrong. I am 100%
certain that I'm not looking at a stale dump as strings on the kernel
and vmcore show them as being from May 9, 2007. I am also certain
that I was not ifconfig'ing any interface when this happened.
> We may be able to 'fix' the problem simply by introducing a delay
> after writing 0x00 to RE_COMMAND, or by calling re_reset() as part
> of re_stop(), but I'd like a way to verify that doing so will actually
> fix the problem.
> Matthew Dillon