[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on co
[Savannah-hackers-public] Re: [gnu.org #498996] Hard-disk failures on colonialone
Thu, 29 Oct 2009 19:29:50 +0100
As far as the hardware is concerned, I think it is best that we do
what the FSF sysadmins think is best.
We don't have access to the computer, don't really know anything about
what it's made of, don't understand the eSATA/internal
differences. We're even using Xen as you do, to ease this kind of
interaction. In short, you're more often than not in better position
to judge the hardware issues.
If you think it's safer to use 4x1.5TB RAID-1, then let's do that.
Only, we need to discuss how to migrate the current data, since
colonialone is already in production.
In particular, fixing the DNS issues I reported would help if
temporary relocation is needed.
On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:
> Ah I see, I was waiting for comments on this - should be able to go out this
> weekend to do
> replacements / reshuffles / etc, but I need to know if savannah-hackers has a
> opinion on how to proceed:
> (1) Do we keep the 1TB disks?
> > - Now that the cause of the failure is known to be a software failure,
> > do we forget about this, or still pursue the plan to remove 1.0TB
> > disks that are used nowhere else at the FSF?
> That was mostly a "this makes no sense, but that's the only thing that's
> different about
> that system" type of response; it is true they are not used elsewhere, but if
> they are
> actually working fine I am fine with doing whatever savannah-hackers wants to
> (2) Do we keep the 2 eSATA drives connected?
> > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > disks?
> Again if they are working fine it's your call; however the bigger issue is if
> you want to
> keep the 2 eSATA / external drives connected, since that is a legitimate
> extra point of
> failure, and there are some cases where errors in the external enclosure can
> bring a system
> down (although it's been up and running fine for several months now).
> (3) Do we make the switch to UUIDs now?
> > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > mess with.
> IMHO it would be better to mess with this when the system is less critical;
> not using UUIDs
> everywhere tends to screw you during recovery from hardware failures.
> And BTW totally off-topic, but eth1 on colonialone is now connected via
> crossover ethernet
> cable to eth1 on savannah (and colonialone is no longer on fsf 10. management
> which I believe we confirmed no one cared about)
> (4) We need to change to some technique that will give us RAID1 redundancy
> even if one
> drives dies. I think the safest solution would be to not use eSATA, and use 4
> 1.5TB drives
> all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would
> need to fail to
> bring savannah down. Other option would be 2 triple RAID1s using eSATA, each
> with 2 disks
> inside the computer and the 3rd disks in the external enclosure.