Difference between revisions of "Hard Drive Testing/Disktest"

From FreekiWiki
Jump to navigation Jump to search
(Add technical details of disktest and fgdb.rb tracking)
 
(→‎TESTING PROCESS: clarify DB part of disktest proceedure)
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
  
 
It test each drive in parallel as follows:
 
It test each drive in parallel as follows:
 +
* Creates a record for each drive in this run in the database
 
* Starts a timer
 
* Starts a timer
 
** This is done using the specified DISKTEST_TIME_LIMIT_PER_GB, in seconds, multiplied by the drive size
 
** This is done using the specified DISKTEST_TIME_LIMIT_PER_GB, in seconds, multiplied by the drive size
Line 29: Line 30:
 
* Checks SMART status
 
* Checks SMART status
 
** As described above
 
** As described above
* Concludes the drive has been successfully wiped with PASSED state
+
* Concludes the drive has been successfully wiped with PASSED state, if we made it here
 +
* All data is logged into the database at this point
  
 
== EXPLANATION OF DISKTEST STATES ==
 
== EXPLANATION OF DISKTEST STATES ==
  
UNTESTED - used while the drive is being tested, or if it failed for unknown reason
+
{| class="wikitable"
PASSED - if the testing process went without error
+
|-
ABORTED - used when the timeout is reached as a fail state
+
! State !! Description
RETRY - used if smart returns DISKTEST_LOGTO_FGDB2, which indiciates RAID controller issues or odd drive problems are preventing any form of successful SMART testing
+
|-
STOPPED - used if the user stops the testing program, using Ctrl-C interrupt
+
| UNTESTED || used while the drive is being tested, or if it failed for unknown reason
FAILED - if the system determines part of the test actually failed
+
|-
 +
| PASSED || if the testing process went without error
 +
|-
 +
| ABORTED || used when the timeout is reached as a fail state
 +
|-
 +
| RETRY || used if smart returns DISKTEST_LOGTO_FGDB2, which indiciates RAID controller issues or odd drive problems are preventing any form of successful SMART testing
 +
|-
 +
| STOPPED || used if the user stops the testing program, using Ctrl-C interrupt
 +
|-
 +
| FAILED || if the system determines part of the test actually failed
 +
|}
  
 
== DATABASE LOGGING ==
 
== DATABASE LOGGING ==
Line 53: Line 65:
 
http://data/disktest_batches/show/4
 
http://data/disktest_batches/show/4
 
(where 4 is the id of the relevant disktest_batch that was created)
 
(where 4 is the id of the relevant disktest_batch that was created)
 +
 +
The batch details can also be managed here, such as drives that
 +
have been destroyed, when the batch is finalized (providing report
 +
to user), etc, by using the edit link.

Latest revision as of 09:53, 25 October 2013

Disktest is used to test, wipe and track data destruction on IDE and SATA drives.

It is automatically configured using settings in lts.conf, deployed via LTSP.

TESTING PROCESS

It test each drive in parallel as follows:

  • Creates a record for each drive in this run in the database
  • Starts a timer
    • This is done using the specified DISKTEST_TIME_LIMIT_PER_GB, in seconds, multiplied by the drive size
    • It is ignored if the model name matches DISKTEST_TIME_LIMIT_IGNORED_MODELS
    • If the drive is very tiny, it may fall back to DISKTEST_TIME_LIMIT_MINIMUM
    • If testing this drive does not finish within the timeout, it is ABORTED
  • It initiates a short SMART test
    • with flags: '-q', 'silent', '-t', 'short',
  • Checks SMART status
    • with flags: '-q', 'silent', '--all'
      • IF the bus is not SCSI (meaning SATA/IDE), it also passes also '-d', 'ata'
    • If smart returns 2, it stops with RETRY state (which is like a failure)
    • If smart returns >2, it stops with FAILED state
  • Runs badblocks for testing
    • /sbin/badblocks -e 1 -c 1024 -swt 0xffffffff DEV
    • if it exists nonzero, we FAIL the drive
  • Checks SMART status
    • As described above
  • Does its own wipe
    • writes 1's
    • writes 1024 bytes of urandom repeatedly
    • checks that the number of bytes successfully written (from the system call perspective) is twice the drive length
  • Checks SMART status
    • As described above
  • Concludes the drive has been successfully wiped with PASSED state, if we made it here
  • All data is logged into the database at this point

EXPLANATION OF DISKTEST STATES

State Description
UNTESTED used while the drive is being tested, or if it failed for unknown reason
PASSED if the testing process went without error
ABORTED used when the timeout is reached as a fail state
RETRY used if smart returns DISKTEST_LOGTO_FGDB2, which indiciates RAID controller issues or odd drive problems are preventing any form of successful SMART testing
STOPPED used if the user stops the testing program, using Ctrl-C interrupt
FAILED if the system determines part of the test actually failed

DATABASE LOGGING

When configured with a DISKTEST_LOGTO_FGDB server to log to, the beginning of the process through the end (if it finishes in any of the above states, not power failure) is logged within the database, including finishing state.

The saved data can be queried from the data sec sidebar link, or here: http://data/disktest_runs

Also, if batch containing a serial number is created on this page: http://data/disktest_batches (whichs shows batches not yet finalized)

Then as testing completes (or if it already has), the following status report will be updated, based on data from the above: http://data/disktest_batches/show/4 (where 4 is the id of the relevant disktest_batch that was created)

The batch details can also be managed here, such as drives that have been destroyed, when the batch is finalized (providing report to user), etc, by using the edit link.