Re: NATA update

From: "Thomas E. Spanjaard" <tgen@xxxxxxxxxxxxx>
Date: Fri, 15 Dec 2006 19:50:15 +0000

YONETANI Tomokazu wrote:
On Thu, Dec 14, 2006 at 09:13:17PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
On Tue, Dec 12, 2006 at 03:13:45PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
If I boot a UP kernel, it proceeds to "start_init: trying /sbin/init",
but then stuck there(the backtrace in DDB is from console handler).
The backtrace looks something like this:
I fear this panic is unrelated, as Victor Balada Diaz is having the same on his 1.6 system. His /sbin/init is stuck in nanosleep, and apparently never jumped to.
No, that backtrace was not from a panic, that was when I press
ctrl+alt+esc after seeing "start_init: trying /sbin/init" message
and it stuck (ctrl+T didn't print anything).  And `call dumpsys'
in DDB didn't start the dump, so I think /sbin/init wasn't even read
from the disk.
Then I tried setting `set hw.ata.ata_dma=0' in the boot driver, and
this time it made it to the login prompt
(updated: http://les.ath.cx/DragonFly/asrock-dmesg.boot)
But sometimes random commands(ls, sysctl, ...) dump core and fail.
Or ld command reports corruption of libraries when I try to build
a new kernel.  On SMP kernel it happens more frequently.  On UP kernel,
if I switch to a UDMAxx mode using natacontrol command, core dumping
occurs more frequently.

Hmm, I'm not seeing any corruptions (yet?) on my SCSI test system.

I just experienced something odd, perhaps similar to your experience earlier. I (probably) experience a null deref when trying to open acd0c, as you can see on http://deviate.fi/~tgen/mountroot_1.png . It appears si_drv1 is NULL on line 218 in sys/dev/disk/nata/atapi-cd.c. Which is strange, because in acd_attach() I really do set si_drv1 on acd0. And, on the SCSI test system, I can open, read, write, etc /dev/acd0c without problems. And the code was able to find acd_open(), so the dev_ops have been registered, so it's not like it's passing the wrong device. Therefore I suspect something somewhere is scribbling over si_drv1, but I don't know where.
I haven't seen the panic in acd code, after your commit.

That was a different panic, due to faulty locking. This one is a new beast. It only happens when you want to use a{,c}d as root device, otherwise there's no problem. Somehow, si_drv1 of my cdev_t's is scribbled over, and even when I recover their contents via devclass_get_device(), still something is screwed up. See http://deviate.fi/~tgen/vm_fault_1.png .

        Thomas E. Spanjaard

