Tuesday, 19 February 2013

How to replace failed hard drive on NetApp FAS

What:

NetApp FAS270, FAS2020, FAS2040, FAS2240, FAS2050

Problem:

Hard drive has failed and needs replacement.

Solution:

Disk will have an amber LED on the front of it if it has an issue. You will need a disk of the same capacity as the one you are removing or larger, also make sure disk is the correct type for the shelves that you have.
You can identify failed disks by logging onto the controllers command line via SSH and running the following command:
aggr status -f
-f will identify any of the broken disks.
Output should give you a <disk id> i.e. 0c.00.10

If the LED is not lit in which some cases this can happen, you can illuminate it.

SSH onto the NetApp box and logon (if you haven't done it already) and switch to advanced mode.
priv set advanced
led_on <disk id identified above>
led_off <disk id identified above>
priv set
Alternatively you can use blink_on & blink_off instead of led_on & led_off.

Remove drive from the shelf and wait for 60 seconds before inserting a new one.

When new drive is in place run the following command to check whether the ID of the disk you have just fitted is owned or not.
disk show -n
If disk auto assign is enabled it’ll be assigned to the head which had the failed disk, if not you will have to do it manually.
disk assign <disk id>
If it won’t accept the command, it might have been auto assigned to the wrong controller/system.  You can clear the assignment from the disk using the following command then try again.
disk assign <disk id> -s unowned -f
The replaced disk will now be assigned as a spare disk to replace the spare which was used when the original failed.
You can check Status of this using following command:
aggr status -s

To check auto disk assign feature use:
options disk auto_assign
The output will either show on or off.

8 comments:

  1. Hi
    cab you tell me, must a replacement disk come from NetaPP or can you get one from a local supplier; We have a netapp fas2050, the drives are no longer availale, a similiar drive was inserted, LED is green but cannot see the drive - Netapp now reports error indicating isufficent spare disks

    ReplyDelete
  2. Hey, your drives should be identical - make, model, size etc. I always had my drives from NetApp and never had any issues replacing them using above guide.

    ReplyDelete
  3. Hi
    Thanks for the reply, If we can get the identical drive elsewhere besides from Netapp, do you know if that may work?
    The Quote we got from our Netapp Reseller is really expensive.

    As far as i know, Netapp sends you a ZERO'd drive - which i assume just means writing zeros to the drive( formatted)


    Rafiq

    ReplyDelete
  4. That might work, but I do not want to promise anything.

    ReplyDelete
  5. I pulled the drive and it removed it from the list of disks, how do i add it back?

    ReplyDelete
  6. If you pulled the drive out, you should wait around 60 sec and insert a new one. Then run commands as describe above to assign your new drive, start with "disk show -n"

    ReplyDelete
  7. Awesome quotes you gave. i have got it.Thanks.

    ReplyDelete
  8. Hi everyone, can you help me with this:

    Righ now i have this

    filer2(takeover)> aggr status -f

    Broken disks

    RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
    --------- ------ ------------- ---- ---- ---- ----- -------------- --------------
    bad label 0b.118 0b 7 6 FC:A - FCAL 15000 272000/557056000 280104/573653840
    filer2(takeover)> disk show
    DISK OWNER POOL SERIAL NUMBER
    ------------ ------------- ----- -------------
    0b.120 filer1 (84824911) Pool0 3SJ13W2X00009041X2CZ
    0b.123 filer2 (84824894) Pool0 3SJ13W2L00009041X27Y
    0b.125 filer2 (84824894) Pool0 3SJ13W3800009041X2DB
    0b.115 filer2 (84824894) Pool0 3SJ142G700009040TGU4
    0b.122 filer1 (84824911) Pool0 3SJ13YKF0000904079MD
    0b.119 filer2 (84824894) Pool0 3SJ13XQF00009041X0C1
    0b.113 filer2 (84824894) Pool0 3SJ13YP800009040VRJX
    0b.114 filer1 (84824911) Pool0 3SJ13XP800009041X0CU
    0b.116 filer1 (84824911) Pool0 3SJ13W0G00009041X2BX
    0b.124 filer1 (84824911) Pool0 3SJ13XLH00009041X27E
    0b.121 filer2 (84824894) Pool0 3SJ13XKB00009041X237
    0b.112 filer1 (84824911) Pool0 3SJ12PZG00009040VR43
    0b.117 filer2 (84824894) Pool0 3SJ12KF000009041XF8C
    0b.118 filer2 (84824894) Pool0 3LM5RHWQ00009921T2V4

    the disk 0b.118 have errors? whats mean that?

    ReplyDelete