savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] Re: [gnu.org #622071] colonialone: disk 'sdd'


From: Sylvain Beucler via RT
Subject: [Savannah-hackers-public] Re: [gnu.org #622071] colonialone: disk 'sdd' failed
Date: Sun, 10 Oct 2010 03:42:07 -0400

Hi,

On Thu, Oct 07, 2010 at 02:53:22PM -0400, Peter Olson via RT wrote:
> > [beuc - Wed Oct 06 15:21:47 2010]:
> > Hi,
> > 
> > On Wed, Oct 06, 2010 at 03:05:04PM -0400, Peter Olson via RT wrote:
> > > > [beuc - Wed Oct 06 14:46:46 2010]:
> > > >
> > > > Hi,
> > > >
> > > > Disk 'sdd' is not available anymore at colonialone.
> > > >
> > > > Smartmontools detected an issue, and mdadm removed it from the
> > RAID
> > > > array.
> > > >
> > > > Can you investigate and possibly replace the failed disk?
> > > >
> > > > Btw, did you receive the failure notifications?
> > > >
> > > > Thanks,
> > >
> > > We took the failed disk out of the RAID array because it appears to
> > be a hard failure rather than a
> > > glitch (all partitions containing the disk degraded at the same
> > time).
> > >
> > > The array contained 4 members and now contains 3 members, all in
> > service.  We expect to replace it when
> > > we next make a trip to the colo.
> > >
> > > colonialone:~# cat /proc/mdstat
> > > Personalities : [raid1]
> > > md3 : active raid1 sda6[0] sdb6[2] sdc6[1]
> > >       955128384 blocks [3/3] [UUU]
> > >
> > > md2 : active raid1 sda5[0] sdb5[2] sdc5[1]
> > >       19534976 blocks [3/3] [UUU]
> > >
> > > md1 : active raid1 sda2[0] sdb2[2] sdc2[1]
> > >       2000000 blocks [3/3] [UUU]
> > >
> > > md0 : active raid1 sda1[0] sdb1[2] sdc1[1]
> > >       96256 blocks [3/3] [UUU]
> > >
> > > unused devices: <none>
> > 
> > 
> > I'm worried that 'dmesg' shows lots of ext3 errors.
> > 
> > How can a failed disk in a RAID1x4 array cause *filesystem*-level
> > errors?
> > 
> > Do we need a fsck or something?
> 
> Here are some of the errors from dmesg:
> 
> [20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 86646
> [20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 85820
> [20930306.822520] ext3_orphan_cleanup: deleting unreferenced inode 86643
> [20930306.829335] ext3_orphan_cleanup: deleting unreferenced inode 86645
> [20930306.840398] EXT3-fs: dm-5: 30 orphan inodes deleted
> [20930306.840542] EXT3-fs: recovery complete.
> [20930307.015205] EXT3-fs: mounted filesystem with ordered data mode.
> 
> I found some discussion on the Net that says these messages are a normal 
> byproduct of making an LVM 
> snapshot.  Are you doing this as part of your backup procedure?

Yes (cf. remote_backup.sh).
Good to know it's not a disk error, thanks.


> I wrote a script to convert dmesg timestamps to wall clock.  These messages 
> are issued every morning 
> between 07:58 and 08:15 (or sometimes as late as 08:27).

Yes, the backup from savannah-backup.gnu.org runs at 12:00 GMT.


Also, LVM is still looking for /dev/sdd7
colonialone:~# lvs
  /dev/sdd7: read failed after 0 of 2048 at 0: Input/output error
[...]

I suggest we plan a reboot this week.

-- 
Sylvain








reply via email to

[Prev in Thread] Current Thread [Next in Thread]