I'm somewhat shooting for a
terabyte of storage for my MythTV box. You may be thinking that a terrabyte is a little extreme, and you may be a little right. But, in addition to being a
Digital Video Recorder (DVR), I'm also planning on copying all of my DVDs to it, so that I have instant access to my entire collection. I'm looking into the space considerations of just ripping them directly to disk or transcoding them into a smaller video format. In either case, I'll probably forego the DVD extras.
So, this is a lot of data. A lot of data that I don't want to lose due to the effort it's going to take to get it on the disk. Also, it's going to be too big to back it up in a cost-effective manner. So, the decision of how to construct the underlying storage system is one of the more important ones in the entire process of constructing my MythTV box. The requirements are few (in no particular order):
- Big: a terabyte is big.
- Redundant: since it's too big to back up, it has to be able to handle a drive failure or two without losing all my data.
- Fast: This one isn't as important, but it will need to be able to handle writing or reading a video fast enough to stream over the network, as well as to serve as a file server for the rest of my computers. I'm not too worried about this, since the network is likely to be the bottleneck here.
- Able to get bigger: this is more of a nice-to-have than a requirement, actually. Drives are increasing in size all the time, and over time I imagine I'll end up needing more storage. So, if it can grow by adding drives, or by replacing smaller drives with bigger ones, that'd be good.
The rest of this entry will discuss the various options that are available to me, or at least the options that I know of.
RAID -- drive arrays
My plan all along has been to get enough drives to make a terabyte-sized
RAID5 array with one live failover drive. Looking at the list above, this would satisfy all of the requirements: big, redundant, and fast. It doesn't satisfy the nice-to-have feature of expandability. RAID5 arrays (as far as I know) can't be grown without redoing the entire array. So, if I wanted to grow the array, I'd buy a new drive, and construct a new array using all of the disks. The problem is that I'd lose all the data since I can't back it up elsewhere while making the new array. I haven't really be worrying about that problem, though, since a terabyte is huge, and really should be enough for anyone.
Linux Logical Volume Manager
But, recently, I've been reading a little bit about the
Linux Logical Volume Manager (LVM). LVM is a way to group multiple devices (drives or RAID arrays) into a logical volume that can be resized on the fly. So let's say, as a small example, that you buy a 10 gig drive and partition it into two partitions: a 5 gig / partition (/ is the top level in Linux) and a 5 gig /home partition. If you fill up your 5 gig /home partition, the only way to grow it is to put in another drive (another 10 gigger for the sake of this example), copy the data over, and tell the system to use the new space as the /home partition. This leaves you with a 5 gig / partition on the first drive, and a 10 gig /home partition on the second drive. You've also got 5 gigs of unused space on the first drive where the /home partition used to be. If you had setup that original drive as one big LVM volume group with the same two partitions as logical volumes (5 gig / and 5 gig /home), when /home filled up you'd have a lot more flexibility in how you could handle it. Assuming there was empty space on the / logical volume (LV), you could simply reallocate that space to the /home LV. Or, if you added in a second 10 gig drive, you could simply add that drive to the LVM volume group, and add it to the /home LV, giving /home 15 gigs.
So, LVM is cool and all, but scoring it against my list shows that it's a miserable failure for the purposes of my MythTV box. Yeah, it can be big and become bigger, but it's not redundant and it's not necessarily fast. But, as I mentioned, LVM can use either single hard drives or RAID arrays as the underlying storage devices. Instead of just having two 10 gig drives, as I said in the example, it could be RAID1 arrays.
Distributed file systems
On a side note, I've also thought about using a distributed file system that spreads the storage out amongst the various machines on the network. This would be really cool, but would make the usage of my MythTV box reliant on most or all of the other machines being up and running. This would also add another layer of technology that I know nothing about, adding to the complexity of the project. I do a lot of reading and tinkering with cluster technology, and this is one area that I'm really interested in, so I'll get to it eventually, just not for this project.
And the winner is...
So, what I think the solution that I've decided on is a combination of both. Starting from the bottom, I'll have several big hard drives configured into a RAID5 array. This will give me size, redundancy, and speed. I'll make that RAID5 array into a LVM volume group with one huge logical volume that encompasses the entire array. Then, in the future, if I need to add to the available storage, I can more easily do so using the LVM. I'd imagine that by the time I need more space, drives will be sufficiently large so that I can easily switch to one or a couple of RAID1 arrays. Why would I switch to RAID1, you may be asking? The reason I'm going RAID5 now, instead of a couple 500 gig RAID1 arrays made into one LV, is due to physical space and cost. 500 gig drives are currently running around $350, and I'd need 4. I could do a series of smaller RAID1 arrays, but the problem there is that my case can't handle that many drives (I may be looking into a larger case, though). But, with RAID5, I can achieve my goals using smaller drives. Coupled with LVM, RAID5 is the perfect solution for now, with room to grow in the future.
If anyone has any opinions on the matter, or information that I may not have considered, I would greatly appreciate it if you'd leave a comment. As I said above, this is one of the most important (and difficult) choices, so I'd like to get it right.
Further info
A few side notes and bits of further information:
First, I actually started this post to mention this
incredible deal at
NewEgg.com, but as is often the case, I digressed like a madman and ended up here. The deal is for 250 gig SATA drives for $110. If I used these drives, I could make my terabyte RAID5 array for $550. If you didn't care about RAID, you could get a terabyte of storage for $440. Absolute madness.
Second, some more info on LVM:
Third, various articles, blog entries, etc. related to setting up large storage systems using Linux:
So, I finally got around to switching my main desktop machine to the Mac (original post. It did not go as smoothly as the other 3 machines mentioned in the original post, but is now up and running. The first and main problem manifested itself during t
Tracked: Jan 07, 17:56
I posted this entry back in November explaining my thinking behind what I planned to use for storage on the MythTV box. This post is about my latest, updated, thinking on the matter. As you may recall, my goal was to start out with a 1 terabyte RAID5
Tracked: Jan 11, 23:31