Difference between revisions of "Linux Software RAID and SATA Hot Swap"

From Baranoski.ca
Jump to navigation Jump to search
Line 121: Line 121:
 
</PRE>
 
</PRE>
 
* At any point during the resync, you can format the drive, mount it, and start using it.  Obviously, it will not have working redundancy until it has fully synced.
 
* At any point during the resync, you can format the drive, mount it, and start using it.  Obviously, it will not have working redundancy until it has fully synced.
 +
 +
 +
==Fiasco #2:  The 2TB Array==
 +
Next on the agenda was to make the 2TB RAID1 array into a software array.
 +
 +
One of my motivators behind this project was to correct the 37 bad sectors that showed up on one of my 2TB drives.  I figured I would work that in, between deleting the hardware array and creating the software array.  I was going to use the linux program '''badblocks''' to verify and fix the drive.
 +
 +
==Using badblocks==
 +
 +
==Fiasco #3:  The Surrpise Array==

Revision as of 14:17, 22 January 2015

I know there are a million pages online about Linux Software RAID, but I wanted to record my own experience with it.

My home server has a lot of storage:

  • a 160GB RAID1 array, for my boot volume, on one of the motherboard's RAID controllers
  • a 500GB RAID1 array, for backups, again on a RAID controller
  • a 2TB RAID1 array, for home directories and virtual machines, on a RAID controller
  • a 3TB RAID1 array, for other stuff, using software RAID.
  • a single 3TB drive, for daily backups of my 3TB array

My motherboard is a number of years old now, and the onboard controllers could not do RAID for 3TB drives, as they only recognized them as 873GB. So I left these as standard drives, and set them up in software RAID.

My goal for this endeavor was to convert my 500GB and 2TB over to software RAID. The reasons being:

  • Actually getting notifications regarding any issues
  • Control over rebuilds, being able to add/remove disks
  • Not being tied to a specific RAID controller with a specific firmware version. If the motherboard were to die, I can easily move the drives.
  • No reboots required to work with the drives
  • Linux can do SATA hot swap, so I don't need to power down to swap a disk

The minor performance hit isn't an issue, so the pros far outweigh the cons.

Fiasco #1: The 500GB Array

I decided to do the 500GB array first, since it was small and quick to work with.

I moved the data off the drive, rebooted the server to get into the BIOS, deleted the array, then booted the server back up. Then I (not showing any of these steps, you'll see why...):

  • partitioned the drives using fdisk
  • created the RAID1 array and waited for it to sync
  • formatted it
  • mounted the drive
  • put all my files back on

Then I rebooted the server, and what do I get? NO OPERATING SYSTEM FOUND

I shut down the server and unplugged the two 500GB drives, and it found the operating system just fine. The 3TB array is using software RAID, but didn't trigger the same issue. Why? To have drives >2.2TB, you need a GUID Partition Table (GPT) [1] on the drive, not the standard msdos partition table. My motherboard won't attempt to boot from a GPT drive.

Now to rebuild the array using GPT drives...

Rebuilding The Array

I can't boot the server with the drives plugged in, and running them on USB to SATA converters is just horrible. What to do? Linux supports SATA hot swap! I booted the server up, then just plugged the drives in. They are instantly recognized by the system, and added in as sd[x] devices.

  • Reenable the array
mdadm -A /dev/md1
  • Mount the drive
mount /dev/md1 /mnt/500GB-array
  • Move all the data off the drive
  • Stop the array, zero the superblocks and remove the array
mdadm -S /dev/md1
mdadm --zero-superblock /dev/sdi1
mdadm --zero-superblock /dev/sdh1
mdadm --remove /dev/md1
rm /dev/md1
  • Create a new partition table and partitions using parted on the first drive
[root@vmware dev]# parted sdi
GNU Parted 1.8.1
Using /dev/sdi
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print

Model: ATA ST3500630AS (scsi)
Disk /dev/sdi: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  500GB  500GB  primary               raid

(parted) rm 1
(parted) print

Model: ATA ST3500630AS (scsi)
Disk /dev/sdi: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start  End  Size  Type  File system  Flags

(parted) mklabel
Warning: The existing disk label on /dev/sdi will be destroyed and all data on this disk will be lost. Do you want to
continue?
Yes/No? yes
New disk label type?  [msdos]? gpt
(parted) unit GB
(parted) mkpart primary 0.00GB 500.0GB
(parted) print

Model: ATA ST3500630AS (scsi)
Disk /dev/sdi: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End    Size   File system  Name     Flags
 1      0.00GB  500GB  500GB               primary

(parted) quit
  • Do the same thing on the second drive
  • Create the array
[root@vmware dev]# mdadm --create /dev/md1 --level=1 --metadata=1.2 --raid-devices=2 /dev/sdh1 /dev/sdi1
mdadm: metadata format 1.02 unknown, ignored.
mdadm: metadata format 1.02 unknown, ignored.
mdadm: array /dev/md1 started.
[root@vmware dev]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid1 sdi1[1] sdh1[0]
      488386414 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.1% (783360/488386414) finish=103.7min speed=78336K/sec

md0 : active raid1 sdc1[0] sdd1[1]
      5860532736 blocks super 1.2 [2/2] [UU]

unused devices: <none>
  • At any point during the resync, you can format the drive, mount it, and start using it. Obviously, it will not have working redundancy until it has fully synced.


Fiasco #2: The 2TB Array

Next on the agenda was to make the 2TB RAID1 array into a software array.

One of my motivators behind this project was to correct the 37 bad sectors that showed up on one of my 2TB drives. I figured I would work that in, between deleting the hardware array and creating the software array. I was going to use the linux program badblocks to verify and fix the drive.

Using badblocks

Fiasco #3: The Surrpise Array