Express5800/GT110dで使用しているZFSが調子悪くなっていました。
zpool status
の結果はこんな感じです。
# zpool status
pool: dtpool
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: resilvered 766G in 2h10m with 0 errors on Sun Dec 30 19:11:52 2012
config:
NAME STATE READ WRITE CKSUM
NAME STATE READ WRITE CKSUM
dtpool DEGRADED 18 4 0
raidz1-0 DEGRADED 48 12 0
scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 FAULTED 0 286 0
too many errors
scsi-SATA_ST3000DM001-BBBBBBBBBBBB-part1 ONLINE 79 12 0
scsi-SATA_ST3000DM001-CCCCCCCCCCCC-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-DDDDDDDDDDDD-part1 ONLINE 0 0 0
errors: 20 data errors, use '-v' for a list
# zpool status -v
pool: dtpool
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: resilvered 766G in 2h10m with 0 errors on Sun Dec 30 19:11:52 2012
config:
NAME STATE READ WRITE CKSUM
dtpool DEGRADED 18 4 0
raidz1-0 DEGRADED 48 12 0
scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 FAULTED 0 286 0
too many errors
scsi-SATA_ST3000DM001-BBBBBBBBBBBB-part1 ONLINE 79 12 0
scsi-SATA_ST3000DM001-CCCCCCCCCCCC-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-DDDDDDDDDDDD-part1 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
dtpool/ts:<0x0>
dtpool/ts:<0x4cf02>
dtpool/ts:<0x50c65>
dtpool/ts:<0x34d69>
dtpool/ts:<0x16271>
dtpool/ts:<0x47086>
dtpool/ts:<0x47088>
dtpool/ts:<0x4708b>
dtpool/ts:<0x4708d>
dtpool/ts:<0x4708f>
dtpool/ts:<0x47095>
dtpool/ts:<0x514a6>
dtpool/ts:<0x514a7>
dtpool/ts:<0x514a8>
dtpool/ts:<0x514ab>
dtpool/ts:<0x514ac>
dtpool/ts:<0x514b0>
dtpool/ts:<0x46fb6>
dtpool/ts:<0x330c1>
dtpool/ts:<0xffffffffffffffff>
#
zpool clear
を実行しろとのことなので、実行してみます。
# zpool clear dtpool
cannot clear errors for dtpool: I/O error
いやな予感。
# zpool status
pool: dtpool
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: resilvered 766G in 2h10m with 0 errors on Sun Dec 30 19:11:52 2012
config:
NAME STATE READ WRITE CKSUM
dtpool UNAVAIL 0 0 0
insufficient replicas
raidz1-0 UNAVAIL 0 0 0
insufficient replicas
scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 FAULTED 0 0 0
too many errors
scsi-SATA_ST3000DM001-BBBBBBBBBBBB-part1 FAULTED 0 0 0
too many errors
scsi-SATA_ST3000DM001-CCCCCCCCCCCC-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-DDDDDDDDDDDD-part1 ONLINE 0 0 0
errors: 20 data errors, use '-v' for a list
#
2台、逝っちゃった?
rebootしてみると、FAULTEDになっているHDD2台が物理的に認識されていないようです。
少し離れた場所にあるのでshutdownして、翌日見に行くことにしました。
で、翌日。
raidzなので、2台壊れてたとなるとデータ復旧できないな…と考えながら電源を入れると、カッコン、カッコンと例の壊れたHDDの音を立てながら起動してきました。
幸いなことに認識しないのは1台だけになってました。 まあ、いずれ壊れるのでしょうけど。
zpool status
の結果はこんな感じ。
# zpool status -v
pool: dtpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: resilvered 766G in 2h10m with 0 errors on Sun Dec 30 19:11:52 2012
config:
NAME STATE READ WRITE CKSUM
dtpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-BBBBBBBBBBBB-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-CCCCCCCCCCCC-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-DDDDDDDDDDDD-part1 ONLINE 0 0 0
errors: No known data errors
早速、shutdownしてHDDを入れ替え、
zpool replace
してみます。
# ls /dev/disk/by-id/scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1
/dev/disk/by-id/scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1
#
# zpool replace -f dtpool scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1
cannot replace scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 with scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1: no such device in pool
offline, detatchも試してみましたが、
no such device in pool
になります。
あれこれと試しながら、ググると情報(
zpool attach throws "no such device in pool" error)があったので、試すとうまくいきました。使用しているバージョンは、zfs-0.6.0-rc9です。
こんな感じ。
# zpool replace -f dtpool /dev/disk/by-id/scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1
#
# zpool status
pool: dtpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Sep 8 16:04:04 2013
33.7G scanned out of 6.27T at 267M/s, 6h47m to go
8.42G resilvered, 0.52% done
config:
NAME STATE READ WRITE CKSUM
dtpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-AAAAAAAAAAAA-part1 UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-EEEEEEEEEEEE-part1 ONLINE 0 0 0 (resilvering)
scsi-SATA_ST3000DM001-BBBBBBBBBBBB-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-CCCCCCCCCCCC-part1 ONLINE 0 0 0
scsi-SATA_ST3000DM001-DDDDDDDDDDDD-part1 ONLINE 0 0 0
errors: No known data errors
#
うまくいってよかった。
0 件のコメント:
コメントを投稿