[Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on co

savannah-hackers-public

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on co

From:	Sylvain Beucler
Subject:	[Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone
Date:	Thu, 29 Oct 2009 19:29:50 +0100
User-agent:	Mutt/1.5.20 (2009-06-14)

Hi,

As far as the hardware is concerned, I think it is best that we do
what the FSF sysadmins think is best.

We don't have access to the computer, don't really know anything about
what it's made of, don't understand the eSATA/internal
differences. We're even using Xen as you do, to ease this kind of
interaction. In short, you're more often than not in better position
to judge the hardware issues.


So:

If you think it's safer to use 4x1.5TB RAID-1, then let's do that.

Only, we need to discuss how to migrate the current data, since
colonialone is already in production.

In particular, fixing the DNS issues I reported would help if
temporary relocation is needed.

-- 
Sylvain

On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:
> Ah I see, I was waiting for comments on this - should be able to go out this 
> weekend to do 
> replacements / reshuffles / etc, but I need to know if savannah-hackers has a 
> strong 
> opinion on how to proceed:
> 
> (1) Do we keep the 1TB disks?
> > - Now that the cause of the failure is known to be a software failure,
> > do we forget about this, or still pursue the plan to remove 1.0TB
> > disks that are used nowhere else at the FSF?
> 
> That was mostly a "this makes no sense, but that's the only thing that's 
> different about 
> that system" type of response; it is true they are not used elsewhere, but if 
> they are 
> actually working fine I am fine with doing whatever savannah-hackers wants to 
> do.
> 
> (2) Do we keep the 2 eSATA drives connected?
> > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > disks?
> 
> Again if they are working fine it's your call; however the bigger issue is if 
> you want to 
> keep the 2 eSATA / external drives connected, since that is a legitimate 
> extra point of 
> failure, and there are some cases where errors in the external enclosure can 
> bring a system 
> down (although it's been up and running fine for several months now).
> 
> (3) Do we make the switch to UUIDs now?
> > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > mess with.
> 
> IMHO it would be better to mess with this when the system is less critical; 
> not using UUIDs 
> everywhere tends to screw you during recovery from hardware failures.
> 
> And BTW totally off-topic, but eth1 on colonialone is now connected via 
> crossover ethernet 
> cable to eth1 on savannah (and colonialone is no longer on fsf 10. management 
> network, 
> which I believe we confirmed no one cared about)
> 
> (4) We need to change to some technique that will give us RAID1 redundancy 
> even if one 
> drives dies. I think the safest solution would be to not use eSATA, and use 4 
> 1.5TB drives 
> all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would 
> need to fail to 
> bring savannah down. Other option would be 2 triple RAID1s using eSATA, each 
> with 2 disks 
> inside the computer and the 3rd disks in the external enclosure.

[Prev in Thread]

Current Thread

[Next in Thread]

[Savannah-hackers-public] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV, Sylvain Beucler, 2009/10/21
- Message not available
  - [Savannah-hackers-public] Re: [gnu.org #494104] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV, Sylvain Beucler, 2009/10/21
- Message not available
  - [Savannah-hackers-public] Re: [gnu.org #494104] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV, Sylvain Beucler, 2009/10/27
- Message not available
  - [Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone, Sylvain Beucler <=
    - [Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone, Sylvain Beucler, 2009/10/31

Prev by Date: [Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone
Next by Date: [Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone
Previous by thread: [Savannah-hackers-public] Re: [gnu.org #494104] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV
Next by thread: [Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone
Index(es):
- Date
- Thread