[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Harddisk economy alternatives

From: Magnus Näslund
Subject: [Gluster-devel] Harddisk economy alternatives
Date: Wed, 09 Nov 2011 17:50:00 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1

We're a digital archive that stores digital images of old records and books. We're about to evaluate glusterfs as a solution to our main storage needs. I'm soliciting advice from both glusterfs crew but also other users with similar needs.

Today we've got about 30 million original images, there is the high quality originals and batch processed highly compressed copy that's used by our customers.

So this gives 30 million large files (3-12MB) plus 30 million converted copies that lands in about 500KB per image.

The use-cases are a bit different: the big images will written once and batched read-only once or twice a year. The small images will be written once or twice a year, but read-accessed 24/7, and is more latency sensitive.

We want the data replicated at least 3 times physically (box-wise), so we've ordered 3 test servers with 24x3TB "enterprise" SATA disks each with an areca card + bbu. We'll probably be running the tests feeding raid volumes to glusterfs, and from what I've seen this seems to be a standard.

Possible future:

Since our storage system will be in it for a really long term, we're looking at the total economics of the solution vs. the data safety concerns.

We've seen suggestions on letting glusterfs manage the disk directly.
The way I see it, this would give a win in that
        1) We would be using all disks, no RAID/spare storage overhead
        2) No RAID-rebuilds
        3) ...
        4) Profit

Also, we know that any long time system we build should be planned with replacing disks continuously.

So in my mind we could buy quality boxes with 24-36 disks run by 3-4 SATA controller cards (Marvell?), using cheap and large desktop disks (maybe not the "green" variety). We could have a reporting system on top of glusterfs that reports defective disks that would be replaced as part of our on-duty maintenance. Since the storage is replicated over 3+ boxes, the breakage of a single disk would not compromise the data safety as long as the disks are replaced in timely manner.

I would be very interested to hear other peoples experience or ideas about storing this kind of data, and particular on the pros/cons on the pass-thru/direct disk model.

Any constructive input is welcome!

Magnus Näslund

reply via email to

[Prev in Thread] Current Thread [Next in Thread]