Fixing broken LVM and Sofware Raid on Linux

I have been building a backup pc our of mostly spare parts, including several old disks. To get the most reliability and space our of the disks, which are all different sizes, I decided to use software raid to mirror each chunk of data in pairs (raid 0) and then use LVM on top of each piece of raid to make all the raid volumes appear as one storage device to the linux OS.

The disks are as follows: 250G, 200G, 160G 120G, so to get the most space out of them I divided them as follows (Note: this is a simplification for the purpose of explanation, the real disk also has a raid 0, mirrored /boot/ partition which is not mentioned below. You need this because LVM is not supported by GRUB, so you can’t boot from a LVM volume):
250G: A (200G), C (40G)
200G: A (200G)
160G: B (120G) C (40G)
120G: B (120G)

A,B,C are partitions on the disks for software raid – I then set up software raid for each pair of A,B,C and set the created LVM physical volumes on top of them, which then are grouped together into one large volume group, giving me around 360G (actually less once you convert to base 2 rather than 10) of mirrored data storage. Note that I can lose any one disk and still have all my data (though I wouldn’t want to hang around too long replacing the disk as a second disk failing could mean you lose most of your data).

I actually created 3 LVM logical volumes, one for mounting each of / (the root of the disk for all OS data), /home (for all the stuff I actually really care about), and a volume for swap that doesn’t really need to be in lvm or raid, but it was just easier to do.

I previously wrote instructions about how to setup LVM and software raid, but this time round managed to completely screw the whole lot up just near the end. I forgot to define one of the raid devices in /etc/mdadm/mdadm.conf so next time I rebooted this raid device was not found in the early boot stages. As this raid volume is needed to create the full set of LVMs for my root filesystem, the system stopped booting right at the beginning because the LVM array was incomplete.

Fixing it was actually not to difficult, but I mention it here in case it is of help to anyone else (or in case I manage to do it again, which is probably likely)

Fist I booted a Ubuntu Live CD (actually a USB stick image created from my running Ubuntu laptop and a spare Ubuntu CD), but nearly any live cd should do. Once booted, I had to install (in the live session) mdadm and lvm2 as Ubuntu does not have these installed by default. Once you have these tools you can start your recovery (note that this needs to be done with root permissions, so sudo -s or similar to get a root shell)

Find the raid devices with mdadm –assemble –scan and then use pvscan and lvscan as required until your system has found your lvm config, then mount it (don’t forget to mount /boot/ too as we will need to rebuild the initrd)

mkdir /tmp/root
mount /dev/mapper/main-root /tmp/root
mount /dev/md0 /tmp/root/boot
chroot /tmp/root

Now you should be pretty much inside your root filesystem on your lvm array. Now we need to make sure all the raid arrays are configured in /etc/mdadm/mdadm.conf

Run mdadm –detail –scan to generate the lines for the config file, and make sure each array is listed in /etc/mdadm/mdadm.conf

From here it is pretty simple, all we have to do is rebuild the initrd so that it knows how to find all the raid and lvm devices needed during boot. dpkg-reconfigure linux-image-<your current kernel version> It is imporant to make sure that you get the right kernel, so have a look in /boot to check you get the right version (uname -r wont work here because you have booted your live cd image and not the kernel your box will run)

I got several warnings about /proc/ not being mounted, but this did not appear to be a problem.

Reboot, and you should be ok. You may want to make a copy of your current kernel in case you accidentally break it, just copy the linux-image…gz and initrd…gz from /boot/ to new names and run update-grub and that should give you two kernels to choose from (hopefully booting choosing them in the right order, the order should be based on the version numbers in your kernel files so adjust if needed)

About Anton Piatek

Professional bit herder, amateur photographer. Linux and tech geek
This entry was posted in Debian, Linux, Ubuntu and tagged , , , , , . Bookmark the permalink.

2 Responses to Fixing broken LVM and Sofware Raid on Linux

  1. Alexis says:

    Interesting post, I’ve yet to have a need for LVM but it’s always seemed like a very useful function.

    Is there any reason you discounted RAID5? As you say, the solution above gives approximately 360GB of space, whereas with a combination of RAID5 and RAID0 you could have done:

    250G: A (80), B(80), C(40), D(50)
    200G: A (80), B(80), C(40)
    160G: B (80), D(50)
    120G: A (80), C(40)

    A: RAID5 (160GB usable)
    B: RAID5 (160GB usable)
    C: RAID5 (80GB usable)
    D: RAID0 (50GB usable)

    Total: 450GB

    This still leaves 30GB unused on the 160GB drive which means there may well be an even more efficient way to split this up.

    Someone ought to do a webpage to calculate this, enter your disk sizes and acceptable RAID levels and it works out the optimal way to maximise the space whilst keeping integrity…

  2. Anton says:

    An excellent question – Yes, I thought about RAID5 and ruled it out for one reason:
    Upgrading…

    I don’t buy that many new disks, and RAID5 needs partitions of all the same size. Now as you point out you can slice up the disks to work on RAID5 getting good usage, but as soon as you try to add new disks it becomes very complicated.

    RAID 5 is more complex to be able to remove one or two disks, and means keeping the same partition sizes to add to the arrays. In your example I can only pull out one single disk from the system in order to add a new one (while running the RAID degraded temporarily)

    My current setup I have 4 disks (and only 4 connectors for them). When I upgrade, I can pull two of the disks out and run degraded, slap in two new large disks, create a new raid across two of them (my new disks will be significantly larger than my old ones as you can see I have gone from 120GB to 200GB on my last upgrade, and it was 80GB disks before that).

    So now I have something like
    250GB, 160GB both degraded.
    I put in my new disks, most likely 500GB or 1TB. I can move the lvm over to the new raided disks and then the old disks are empty and can be either removed or merged into a raid together if I decide that the space is really needed (unlikely given the size of disks now)

    This means I can easily upgrade as I can remove half my disks so long as the new ones are at least the same size as the largest of the old ones

    Doing it with mirrored raid means I can remove half my disks, which makes it much easier to set up a new pair of disks in raid – and if the new disks are both the same size (quite likely if I buy them specifically) then creating a new mirrored raid is simple and easy to get your head around.

    RAID5 could work, but im not sure my brain could cope with the case of upgrading without having another machine or many more ide/sata connectors.

    Of course if you have a sata card or something that gives 8 connectors then RAID5 becomes easier, though you do have to then either do lots more partitioning with your new disks, or buy 3 new disks

Leave a Reply