Drive failure imminent
Recently one of my hard drives started throwing errors:
Jun 4 22:51:48 system kernel: [182083.672985] ata7.00: exception Emask 0x0 SAct 0x60 SErr 0x0 action 0x0
Jun 4 22:51:48 system kernel: [182083.673073] ata7.00: irq_stat 0x40000008
Jun 4 22:51:48 system kernel: [182083.673123] ata7.00: failed command: READ FPDMA QUEUED
Jun 4 22:51:48 system kernel: [182083.673191] ata7.00: cmd 60/00:28:50:b0:96/01:00:01:00:00/40 tag 5 ncq 131072 in
Jun 4 22:51:48 system kernel: [182083.673191] res 41/40:00:48:b1:96/00:00:01:00:00/40 Emask 0x409 (media error) <F>
Jun 4 22:51:48 system kernel: [182083.673361] ata7.00: status: { DRDY ERR }
Jun 4 22:51:48 system kernel: [182083.673409] ata7.00: error: { UNC }
Jun 4 22:51:48 system kernel: [182083.674899] ata7.00: configured for UDMA/133
Jun 4 22:51:48 system kernel: [182083.674924] sd 6:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 4 22:51:48 system kernel: [182083.674932] sd 6:0:0:0: [sdc] tag#5 Sense Key : Medium Error [current] [descriptor]
Jun 4 22:51:48 system kernel: [182083.674939] sd 6:0:0:0: [sdc] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 4 22:51:48 system kernel: [182083.674947] sd 6:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 00 01 96 b0 50 00 00 01 00 00 00
Jun 4 22:51:48 system kernel: [182083.674951] blk_update_request: I/O error, dev sdc, sector 26653000
Jun 4 22:51:48 system kernel: [182083.675063] ata7: EH complete
Jun 4 22:51:50 system zed: eid=11 class=io pool=vpool
These types of errors started showing up during zpool scrub
events.
Replacing a drive in a ZFS pool
Once the new drive arrived and was attached to the system, it was time to add
the drive to an existing mirror. The steps are basically (1) partition the
drive, (2) encrypt the drive, (3) add the drive into the zpool, and (4) remove
the failing drive from the zpool. ZFS does have the ability to combine steps (3)
and (4) via zpool replace
, but I preferred to keep the steps separate as it
gave me an opportunity to closely follow the process.
Use lsblk
to determine the drive letter sd*
for the new drive. Then lookup
the drive’s stable (permanent) identifier in /dev/disk/by-id
. For example, to
find the identifier for disk sda
:
$ ls -l /dev/disk/by-id/ | grep sda
lrwxrwxrwx 1 root root 9 Jun 9 20:04 wwn-0x84221ea347353432 -> ../../sda
(Multiple identifiers for the same drive may be listed, I’ve arbitrarily chosen to use the one with the above format.)
Now partition the new drive using parted
. In this case, I’ll be using a disk
label of vault5
:
$ parted -a optimal /dev/disk/by-id/wwn-0x84221ea347353432
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Error: /dev/sda: unrecognised disk label
Model: ATA WDC WD40EFRX-68N (scsi)
Disk /dev/sda: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
(parted) mklabel gpt
(parted) unit MiB
(parted) mkpart vault5 1 -1
(parted) print
Model: ATA WDC WD40EFRX-68N (scsi)
Disk /dev/sda: 3815448MiB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1.00MiB 3815447MiB 3815446MiB vault5
(parted) quit
Information: You may need to update /etc/fstab.
Encrypt the entire drive using LUKS:
$ cryptsetup luksFormat --cipher aes-xts-plain64 --hash sha512 --key-size 512 \
--iter-time 5000 --use-random --verify-passphrase \
/dev/disk/by-id/wwn-0x84221ea347353432-part1
$ cryptsetup luksOpen /dev/disk/by-id/wwn-0x84221ea347353432-part1 vault5_crypt
Optionally add a secondary passphrase:
$ cryptsetup luksAddKey --iter-time 5000 /dev/disk/by-id/wwn-0x84221ea347353432-part1
Prior to adding the new drive, my zfs setup looked like:
NAME
vpool
mirror-0
vault1_crypt
vault2_crypt
mirror-1
vault3_crypt
vault4_crypt
To add the new vault5
drive as a mirror of vault1
and vault2
, I would
zpool attach
to one of the existing drives in the mirror. The mirror would
grow from a 2-way mirror to a 3-way mirror:
$ zpool attach -oashift=12 vpool vault2_crypt /dev/mapper/vault5_crypt
While the resilvering process is underway, I might run watch zpool status -v
to follow along.
Once I’m satisified with the new drive in the mirror, I can remove one of the
old drives by running zpool detach
. The mirror would shrink from a 3-way
mirror back to a 2-way mirror:
$ zpool detach vpool vault2_crypt
If replacing both drives in a 2-way mirror, after both new drives have been attached and the old drives detached from the pool, any increase in mirror size will not be immediately available. There are two options to increase the mirror size:
Option 1) First export
then import
the pool.
Option 2) For each new device,zpool online -e vpool <new_device>
, such as:
zpool online -e vpool vault5_crypt
.