Difference between revisions of "Hard Drive Testing"

From FreekiWiki
Jump to navigation Jump to search
Line 50: Line 50:
 
===During Testing===
 
===During Testing===
  
While disktest is running it will output status updates on the drives every few seconds. Drives that are still testing will give a status output like this:
+
While disktest is running it will output status updates on the drives every few seconds.
 +
 
 
  sda: IDE 80.0GB <<Seagate SD380830A (5GQ1DB70)>>: 13% of wipe complete - 1:58:02
 
  sda: IDE 80.0GB <<Seagate SD380830A (5GQ1DB70)>>: 13% of wipe complete - 1:58:02
 
  sdb: IDE 80.0GB <<Samsung SV400AH (173G27Q37282S)>>: 80% of badblocks read (part 1) complete - 1:58:02
 
  sdb: IDE 80.0GB <<Samsung SV400AH (173G27Q37282S)>>: 80% of badblocks read (part 1) complete - 1:58:02

Revision as of 14:59, 21 June 2012

This document is currently out of date and will be rewritten in the next few weeks - Patrick 6/19/12

New Version

The actual work to be done on hard drive testing ideally needs to happen only twice a day; once for a batch of smaller drives (100GB and less) in the morning and once for a batch of larger drives (larger than 100GB) to run through the afternoon and overnight if necessary.

Setting Up

If there are finished hard drives already on the racks then proceed to Finishing a Batch.

  1. Grab the red tray and head into TARDIS (the locked storage room) then open up the big brown lockbox.
  2. Hopefully the drives in the box are fairly well organized by size and interface. If this is the first batch of the day you'll want to grab smaller drives, if it's the second batch you can go for the larger ones. Load up the tray with an equal number of IDE- and SATA-connected hard drives. Don't forget about the 2.5" (laptop-sized) drives on the smaller top shelf! Be sure to lock up the big brown box again when you're done.
  3. Take the tray of drives over to one of the wiping racks and start connecting them to the boards. Most boards have 4 connections with various mixtures of IDE and SATA cables.
    1. Try to keep all the drives connected to a single board around the same size so we don't have 3 smaller drives finished and waiting around sucking electricity while the 1 larger drive is still finishing.
    2. Look the drives over for identifying information while you're connecting them; be sure you can clearly identify the model and serial number of the drive. You may need this information for sorting the failed drives from the passed drives later on. If you can't find the serial number on a drive then make sure you're attaching it to a board with other drives you can positively identify so you can use process of elimination when identifying the finished drives.
    3. Check the jumpers on IDE drives and make sure they're set to Master. For most drives this is set by a single jumper positioned vertically between the two pins closest to the IDE pins; check the labels on the drives as they will generally indicate if they require a different arrangement (or default to Master with no jumpers at all).
  4. As soon as you've connected all the drives to a single board, turn it on. You can move on to hooking up another board while the first one boots up and does the initial check of the drives.
  5. Once a board has finished booting Disktest has started (you will see a list of the detected hard drives and a prompt asking to start the test) proceed to Starting Disktest.

Starting Disktest

Disktest is our nifty little home-brewed hard drive testing and wiping program.

When Disktest first starts you will be presented with a list of drives that should look something like this:

PASSED sda: IDE 80.0GB <<Seagate SD380830A (5GQ1DB70)>>
FAILED sdb: IDE 80.0GB <<Samsung SV400AH (173G27Q37282S)>>
PASSED sdc: SATA 80.0GB <<Seagate SD380830A (5GQ1HG92)>>
PASSED sdd: SATA 80.0GB <<Western Digital WD800JBB (WMAMF92810)>>

Are these the expected drives and do you want to test them? [yes]:

The information on these lines indicates the following:

  • PASSED/FAILED: The SMART status of the drive after the initial test.
  • sda/sdb/sdc/sdd: The identifier the system has assigned to the drive.
  • IDE/SATA: The drive's connection type.
  • 80.0GB: The capacity of the attached drive.
  • << ... >> : The drive manufacturer, model number, and the device serial number in parenthesis.

Look this screen over carefully before proceeding!

  1. Are any drives marked as FAILED? If so you will want to abort the test and power off the board then replace the failed drive(s) and start the board again.
  2. Are all the attached drives on the list? Double check that the power and IDE/SATA cables are firmly connected and try to determine if the disc is actually spinning. If any connections were loose you will need to abort the test and restart the board to redetect the devices. If the connections seem solid and the disc is spinning you may need to try the drive on another board or with another combination of drives; incompatibilities happen.
  3. Are the drives indicating the capacity they're labeled with? Some variation is normal; a drive labeled 80GB reporting as 83.0GB is common, a drive labeled 200GB indicating 3.4MB is a fail.
  4. Is the manufacturer and model information accurate? Drives from some manufacturers will report some information as "Unknown" and this is fine, but a string of total gibberish is a good indicator of failure.

If anything seems out of order you can abort the test by responding to the "Are these the expected drives..." prompt with an "n", taking you to a final status screen indicating the aborted or failed state of the attached drives. Press enter at the final status screen to power off the board.

If everything is in order you can simply strike Enter at the prompt and the test will begin.

During Testing

While disktest is running it will output status updates on the drives every few seconds.

sda: IDE 80.0GB <<Seagate SD380830A (5GQ1DB70)>>: 13% of wipe complete - 1:58:02
sdb: IDE 80.0GB <<Samsung SV400AH (173G27Q37282S)>>: 80% of badblocks read (part 1) complete - 1:58:02
sdc: SATA 80.0GB <<Seagate SD380830A (5GQ1HG92)>>: all tests passed:
---
-smart test passed
-initiating smart self-test
-badblocks test started
-100% of badblocks read (part two) complete
-smart test passed
-disk wipe started
-100% of wipe complete
-disk wipe finished
-smart test passed
-2:08:32
sdd: SATA 80.0GB <<Western Digital WD800JBB (WMAMF92810)>> 22% of badblocks write (part 2) complete - 1:58:02

Any variation on the outputs above is normal. Drive sdc in the example above has successfully finished testing and wiping and is giving it's final status summary. If you notice incomplete drives' progress percentages or running time failing to advance this may indicate the system has frozen and needs to be restarted. If you suspect a freeze then make a note of which board it is and what the percentages/times indicated are; check it again in 10 minutes and abort testing if there has been no progress.

Testing can be aborted at any time using the keyboard command Ctrl-C. A final status screen will be presented and any drives that have already finished testing will be indicated as such and can be considered passed.

Finishing a Batch

Once disktest has finished you will be presented with a results screen like this:

sda: IDE 80.0GB <<Seagate SD380830A (5GQ1DB70)>>
sda passed! Label and store it.
sdb: IDE 80.0GB <<Samsung SV400AH (173G27Q37282S)>>
sbb failed! Recycle it!
sdc: SATA 80.0GB <<Seagate SD380830A (5GQ1HG92)>>
sdc passed! Label and store it.
sdd: SATA 80.0GB <<Western Digital WD800JBB (WMAMF92810)>>
sdd passed! Label and store it.
Hit enter to shut down.


Old Version

Hard drive testing takes a while, so the actual work done in testing (described herein) pretty much only happens twice a day. The staff or build person doing this should be able to identify an IDE hard drive, figure out jumper diagrams, and remember a four-digit number for more than 30 seconds.

Quick Guide:

The following is a quick guide for those who've already read the detailed instructions before and just want to get going:

  • Note the results for the completed drives
  • Turn off the system and record results on labels and tallysheet
  • Get drives for testing and make sure they are labeled
  • Set the jumpers on the to-be-tested drives to Master or Single
  • Make sure the hard drive testing computer is off
  • Plug the drives in to the removable bays and slide them in all the way
  • Power up the computer, make sure the tests are running correctly
  • Turn off the monitor
  • Put away the tested drives

Detailed Instructions

If the test is finished, it'll tell you what to write on the labels. The first will be HDA and the second will be HDC. If the test went well, it will say Enter to Power Down; hit Enter and turn off the system when it says Power Down. Record the size (in MB) on the gizmo label, and write OK so we know someone didn't just read the drive label and write the size down! If it's a bad drive, put an X through the number.

Get drives for testing, generally from the build lockup ("Tardis"). At the beginning of the day, you'll want to test drives that are nearer the low end of the range so they can finish before the end of the day. At the end of the day, test larger drives. The drives to be tested should be the same size, since you can't swap out a smaller one while the larger one is still running. We also generally try to test two drives of the same brand. Make sure the drive is labeled. There may be a sheet of gizmo labels with numbers already on them near the storage shelf, or you might need to create new ones. Try not to cover size or geometry information when putting it on the drive, and look out for airholes (they usually are marked and say not to block them).

Set the jumpers on the to-be-tested drives to Master or Single. In the case of Western Digital drives, you can just remove the jumpers or turn them sideways (to neutral position). For others, you will need to look for jumpering information on the drive labels. This may read Master, or Single, or Stand Alone, or DEV0, or something similar.

If the hard drive already has a gizmo number, note this when you put it in. The script will ask you about it.

Make sure the hard drive testing computer is off before adding or removing drives.

Plug the drives in to the removable bays and slide them in all the way; sometimes it is not totally clear if the drive sled is all the way in the bay, so give it a little push after you think it's in all the way.

Power up the computer, make sure the tests are running correctly. Each drive bay should have now its power indicator light lit. If it is not, the bay may not be locked, not seated all the way back in the drawer, or not plugged in to the hard drive's power cable. Turn off the power to the entire system before you try to correct the problem.

It will give you information about the drives it detected; make sure it finds an hda and an hdc; finding drives at other positions probably means you need to recheck the jumpers. The sizes found should be similar to what you expected. Watch it get started to make sure it doesn't have problems right away (if it does, you may want to just fail that one drive and replace it with another one).

Once you and the program agree that everything is set up to your satisfaction, it will start the tests running. You can expect to wait on the order of hours, depending on the size of the drive(s). Each drive is tested with the badblocks program, which will write data to the drives, and then confirm it did so properly by reading it. It will do this to every partition on the disk four times.

Put away the tested drives and turn off the monitor. The tested good drives will go in the Tardis; ones that tested bad should be smacked with a hammer and hurled, with prejudice and not gently at all, into the barrel set up for hard drives.