DragonFly BSD
DragonFly bugs List (threaded) for 2005-03
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: failing disk, or not?


From: Bill Hacker <wbh@xxxxxxxxxxxxx>
Date: Tue, 29 Mar 2005 12:25:04 +0800

George Georgalis wrote:

I'm seeing some disk errors in dfly that I cannot reproduce with other
OS checking the partition:

ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278443 of 229605584-229605599 (ad4 bn 488278443; cn 30393 tn 236 sn 30) retrying

This happened while running dvdbackup and I reproduced it running
a dd read from the partition. However, after several attempts I cannot
reproduce it from Linux badblocks (read or non-distructive write) check
or linux dd read from the partition. I know failures can be intermetint
But not getting any errors at all yet, from Linux, seems odd at this
point, if the disk is really failing.

Might DFLY be attempting I/O beyond the permitted end of the assigned area? Or to an area that Linux is not trying to access?


# df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad4s3a 248M 122M 106M 54% / /dev/ad4s3d 248M 1.3M 227M 1% /var /dev/ad4s3e 124G 94G 20G 83% /usr procfs 4.0K 4.0K 0B 100% /proc

A bit of history, I did have a system lockup -- I could switch virtural
terminals but no keyboard input was accepted -- a week or two ago,
didn't file bug because I was half-hazard experimenting (in user space)
and couldn't explain well enough, at the time all I was doing, now I
don't even remember. A fsck was required, and with a 95Gb /usr, that
took quite a while. (welcome comments on why softupdates didn't help
here),

Best case, SU just leave data in an earlier state rather than half-committed. More transaction-oriented than jornalling.

fsck -y doesn't care about the content of data - only about its
proper file indexing, so *maybe* some time saved during
a 'preen', but no savings at all with fsck -y.

> also the /usr partition was near or over 100% capacity, but I
never got disk full errors, ie didn't *completely* run out of space.

It normally has around a 10% reserve, will usually stand 102% before it
even throws an error message.

At this point can I be sure my disk is failing or could there be some
driver instability? The full dmesg is below.

Don't see it in dmesg, but ad4 is a 200Gb Seagate drive, on a nvidia
sata controler.  Disk Product Number ST3200822AS, Part Number 9W2854-301

Thanks,
// George


Cutting ...
agp0: <NVIDIA Generic AGP Controller> mem 0xe0000000-0xe3ffffff at device 0.0 on pci0
agp0: Unable to find NVIDIA Memory Controller 1.

Unable? That's odd ?


device_probe_and_attach: agp0 attach returned 19
isab0: <PCI to ISA bridge (vendor=10de device=00e0)> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <unknown card> (vendor=0x10de, dev=0x00e4) at 1.1 irq 10

NVIDIA - nForce3 250 SMBus Controller ?


*SNIP*

atapci0: <Generic PCI ATA controller> port 0xf000-0xf00f at device 8.0 on pci0
ata0: at 0x1f0 irq 14 on atapci0
installed MI handler for int 14

ata1: at 0x170 irq 15 on atapci0
installed MI handler for int 15
atapci1: <Generic PCI ATA controller> port 0xec00-0xec7f,0xeb00-0xeb0f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 11 at device 10.0 on pci0

ata2: at 0x9f0 on atapci1
installed MI handler for int 11

ata3: at 0x970 on atapci1

*snip*


ad0: 58644MB <Maxtor 6Y060L0> [119150/16/63] at ata0-master BIOSDMA

ad4: DMA limited to UDMA33, non-ATA66 cable or device
ad4: 190782MB <ST3200822AS> [387621/16/63] at ata2-master BIOSDMA

I'm puzzled:


- ata0-master claims /dev/ad0

- ata1-master claims /dev/acd0

- ata2-master claims /dev/ad4

- ata3 seems empty...

So how do we skip /dev/ad1, /dev/ad2, and /dev/ad3 to arive at /dev/ad4?

Mounting root from ufs:/dev/ad4s3a

ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278443 of 229605584-229605599 (ad4 bn 488278443; cn 30393 tn 236 sn 30) retrying


You are on slice2, presumably well up in the cylinder count.
Might the areas above be a geometry mapping conflict?

Bill



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]