|From:||Andrew Atrens <atrens@xxxxxxxxxxxxxxxxxx>|
|Date:||Mon, 10 May 2004 17:22:26 -0400|
On May 10, 2004 04:53 pm, Matthew Dillon wrote: > :Hi All, > : > :My system was up and running from Friday (with a kernel built that day) to > :(roughly) about 15 minutes ago when all of a sudden my Diab cross-compiler > :(linux executable, nfs mounted fs) started segv'ing. > : > :At that point I figured it might be time to reboot so I cvsupped to pick > : up the latest changes and then checked out a new kernel. A strange thing > : happened though when I ran config... It gave me errors about files and > : directories not being accessible. I immediately rebooted, and ran config > : again, same errors. Ouch. Dropped into single user mode and started to > : fsck and it doesn't look good :( ... > > It's possible that the corruption is new, but it's also possible that > the corruption is old. e.g. the bitmap blocks could have gotten > messed up prior to Friday. Did you fsck your filesystems manually > on Thursday or Friday? Just rebooting would not have helped since a Yes, on both days. Dropped into single user and forced a fsck - did that two or three times. > normal reboot doesn't cause a filesystem check to occur. Indeed. > I haven't had any corruption since the last major FP fix, which was > on Wednesday. That's good. Hopefully it's not too widespread. > In anycase, what I recommend is that we try to nail down whether there > is still a corruption issue or not, which means doing a manual fsck, > making sure the kernel is totally up to date, and cleaning up the > system. If any corruption reoccurs after that then we know we still have an > issue. Okay, so I'm building a new kernel now. Will back up my important data and give it a go tomorrow. The system was up a little over two days before the corruption happened. Another question though. Why would an executable, residing on an nfs mounted disk, suddenly appear to be corrupt and segv every time I run it. One minute it works, next it doesn't. Reboot and the problem goes away. Last time I saw something similar, it looked like the first 4k of the executable had been replaced with 0's. This time the corruption must be somewhere in the middle of the executable, or else it wouldn't even try to start, right ? Andrew.