savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] [gnu.org #622071] colonialone: disk 'sdd' fail


From: Peter Olson via RT
Subject: [Savannah-hackers-public] [gnu.org #622071] colonialone: disk 'sdd' failed
Date: Thu, 07 Oct 2010 14:53:22 -0400

> [beuc - Wed Oct 06 15:21:47 2010]:
> 
> Hi,
> 
> On Wed, Oct 06, 2010 at 03:05:04PM -0400, Peter Olson via RT wrote:
> > > [beuc - Wed Oct 06 14:46:46 2010]:
> > >
> > > Hi,
> > >
> > > Disk 'sdd' is not available anymore at colonialone.
> > >
> > > Smartmontools detected an issue, and mdadm removed it from the
> RAID
> > > array.
> > >
> > > Can you investigate and possibly replace the failed disk?
> > >
> > > Btw, did you receive the failure notifications?
> > >
> > > Thanks,
> >
> > We took the failed disk out of the RAID array because it appears to
> be a hard failure rather than a
> > glitch (all partitions containing the disk degraded at the same
> time).
> >
> > The array contained 4 members and now contains 3 members, all in
> service.  We expect to replace it when
> > we next make a trip to the colo.
> >
> > colonialone:~# cat /proc/mdstat
> > Personalities : [raid1]
> > md3 : active raid1 sda6[0] sdb6[2] sdc6[1]
> >       955128384 blocks [3/3] [UUU]
> >
> > md2 : active raid1 sda5[0] sdb5[2] sdc5[1]
> >       19534976 blocks [3/3] [UUU]
> >
> > md1 : active raid1 sda2[0] sdb2[2] sdc2[1]
> >       2000000 blocks [3/3] [UUU]
> >
> > md0 : active raid1 sda1[0] sdb1[2] sdc1[1]
> >       96256 blocks [3/3] [UUU]
> >
> > unused devices: <none>
> 
> 
> I'm worried that 'dmesg' shows lots of ext3 errors.
> 
> How can a failed disk in a RAID1x4 array cause *filesystem*-level
> errors?
> 
> Do we need a fsck or something?
> 
> 

Here are some of the errors from dmesg:

[20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 86646
[20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 85820
[20930306.822520] ext3_orphan_cleanup: deleting unreferenced inode 86643
[20930306.829335] ext3_orphan_cleanup: deleting unreferenced inode 86645
[20930306.840398] EXT3-fs: dm-5: 30 orphan inodes deleted
[20930306.840542] EXT3-fs: recovery complete.
[20930307.015205] EXT3-fs: mounted filesystem with ordered data mode.

I found some discussion on the Net that says these messages are a normal 
byproduct of making an LVM 
snapshot.  Are you doing this as part of your backup procedure?

I wrote a script to convert dmesg timestamps to wall clock.  These messages are 
issued every morning 
between 07:58 and 08:15 (or sometimes as late as 08:27).

Peter Olson
FSF Senior Systems Administrator

-------------snip-------------
#! /usr/bin/env python

import sys
import datetime

dt = datetime.datetime(2010, 10, 7, 14, 29, 26)
uptime = 20952599

while True:
    line = sys.stdin.readline()
    if not line:
        break
    curtime = int(line.split('.')[0].split('[')[1])
    delta = datetime.timedelta(0, curtime - uptime)
    dt2 = dt + delta
    print dt2.isoformat(' '), line,





reply via email to

[Prev in Thread] Current Thread [Next in Thread]