Encourage FUNIX

The consultation of the site is completely free and without advertisements. Donations are nevertheless appreciated to pay for hosting and encourage its author


Check disk integrity with smartmontools
[ Overview | Introduction to SMART | The smartd daemon | Running a one-time test with smartctl ]

Last modified February 7, 2021

Check disk integrity with smartmontools

This site has been automatically translated with Google Translate from this original page written in french, there may be some translation errors



Presentation

This page shows how to check the integrity of your disks with smartmontools , for prevention purposes and thus avoid finding yourself trying to recover data after a disk crash. For this we will rely on Self-Monitoring, Analysis and Reporting Technology ( SMART ), this is a technology that most modern hard drives are equipped with and which is used to monitor their health and anticipate failures. There are two types of failures, predictable failures and unpredictable failures, typically an increasing number of bad disk sectors is a sign of a future disk crash. On the other hand, unpredictable failures as their name suggests occur suddenly without any warning and warning signs. It is estimated that SMART can predict 30% of failures , it is still not great, but it is always something.

In addition to SMART, some file systems also integrate integrity tests, this is particularly the case for btrfs.


To know if a SMART is active on your disks, you can use the command fdisk -l which will list all the disks and partitions, with an output which will look like this (extract):

Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Disk model: LITEON IT LMT-25
Units: 1 × 512 sector = 512 bytes
Sector size (logical / physical): 512 bytes / 512 bytes
I/O size (minimum / optimal): 512 bytes / 512 bytes
Disk label type: gpt
Disk identifier: F3B2EF1F-E606-4204-9D4C-5C12DF35FAC1

Device Start End Sectors Size Type
/dev/sda1 2048 52391935 52389888 25G Linux file system
/dev/sda2 52391936 78116864 25724929 12.3G Linux file system
/dev/sda3 78118912 132304896 54185985 25.9G Linux file system
/dev/sda4 132306944 174565376 42258433 20.2G Linux file system
/dev/sda5 174567424 237266944 62699521 29.9G Linux file system
/dev/sda6 237268992 500118158 262849167 125.3G Linux file system


Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Drive Model: WDC WD10EZEX-21M
Units: 1 × 512 sector = 512 bytes
Sector Size (logical / physical): 512 bytes / 4096 bytes
I/O Size (minimum / optimum): 4096 bytes / 4096 bytes
Disk Label Type: dos
Disk Identifier: 0x0f0bbca7

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1953520064 1953518017 931.5G 83 Linux


Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Drive Model: TOSHIBA HDWD240
Units: 1 × 512 sector = 512 bytes
Sector size (logical / physical): 512 bytes / 4096 bytes
I/O size (minimum / optimal): 4096 bytes / 4096 bytes
Disk label type: gpt
Disk identifier: 19A0DC6F-BEDA-4D25-A79C-AECEA98F1A76

Device Start End Sectors Size Type
/dev/sdc1 2048 7814037134 7814035087 3.7T Linux file system

Now we will check that the TOSHIBA disk has SMART by typing smartctl -i /dev/sdc here is the result

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.6-desktop-1.mga7] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA HDWD240
Serial Number: Z9J1S12QS5HH
LU WWN Device Id: 5 000039 9b560c5a3
Firmware Version: KQ000A
User Capacity: 4 000 787 030 016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Zoned Device: Device managed zones
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Feb 7 10:17:37 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART is indeed available and activated.

The smartmontools tools are based on a smartd daemon which is more suited to servers and machines that run 24/7 and on the smartctl tool which can be launched punctually or by means of a script on a PC that is switched on intermittently. On my Mageia, I installed the default package that can be found on any modern distribution.

Both tools allow you to run tests that are the only way to retrieve disk operating information. There are two types of tests:

  • The short test is a quick 2-minute test that will allow you to test
    • electrical and electronic parameters
    • mechanical operating parameters
    • sector status and read performance by reading certain sectors from the disk and verifying the data
  • The long test is a test that can last for hours depending on the size of the disk. It is a short test with the difference that the verification of the sectors is this time exhaustive and scans the entire disk.

You should know that there are two schools of thought for using disk tests, one recommends using tests to prevent failures, even if SMART only detects 30% of these failures in practice, the other school believes that the tests are aggressive and contribute to damaging disks, reducing their lifespan and de facto increasing the risk of failures! As for me, I chose to use SMART on my server.

In the rest of this page, I have taken as an example my Dell PowerEdge T310 server which has a configuration based on a Dell PERC 6/i RAID controller with:

  • two 300GB SAS disks each mounted in RAID 1 (mirroring) on ​​which the system is installed
  • 4 SATA disks (2 of 4TB and 2 of 3TB) mounted in RAID 5 on which the data is located

On the USB port I have a 4TB SATA disk for data backup.

The smartd daemon

The smartd server monitors hard drives and sends alerts in case of errors in the background. To configure it, we will modify the file /etc/smartd.conf here is its content:

DEVICESCAN -o on -S on -s (S/../.././01|L/../../1/03) -m olivier -M exec /usr/bin/mail

This means that all disks will undergo a short test every day at 1am and a long test on Mondays at 3am, in case of errors encountered an email will be sent to the user olivier . More precisely in the detail of each option:

DEVICESCAN allows you to specify all disks, but you can also specify a specific disk, for example /dev/sda -a for a SATA disk with the special file /dev/sda. The smartctl --scan command will help you find the identifier of each of the detected disks, here is the result on my server:

/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d sat # /dev/sdc [SAT], ATA device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device

For the rest of the options

-o applies only to SATA drives with on on enables SMART Automatic Offline Testing, which instructs the drive to update SMART operating data every 4 hours.

-S to on enables Attribute Autosave to save operating data such as error counters, power-on time so that they are not reset to zero every time the drive is turned off and on again. This assumes that your drive has internal memory to store this operating data.

-s to run the tests at a scheduled time, we will rely on regular expressions that have this form ( T/MM/DD/d/HH)

  • T which can be S for short test or L for long test
  • MM which corresponds to the month, between 01 and 12 on 2 characters, a .. corresponds to all months
  • DD which corresponds to the day of the month, between 01 and 31 on 2 characters, a .. corresponds to all days
  • d corresponds to the day of the week, from 1 (Monday) to 7 (Sunday)
  • HH is the time of day

In this case, the S/../.././01 therefore corresponds to a short test every day at 1am, the L/../../1/03 corresponds to a long test every Monday at 3am, and now the (S/../.././01|L/../../1/03) with a | (and) between the two corresponds to the addition of the two conditions.

For more information on the file syntax, see the man here .

We launch or restart smartd by typing systemctl start smartd , journalctl -f gives:

Nov 08 19:29:05 mana.kervao.fr smartd[16392]: smartd 7.0 2018-12-30 r4883 [x86_64-linux-5.6.6-server-1.mga7] (local build)
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Opened configuration file /etc/smartd.conf
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Drive: DEVICESCAN, implied '-a' Directive on line 33 of file /etc/smartd.conf
Nov. 08 19:29:05 mana.kervao.fr smartd[16392]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Nov. 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/sdc [SAT], opened
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/sdc [SAT], ST4000DM004-2CV104, S/N:WFN0VFTR, WWN:5-000c50-0be69c167, FW:0001, 4.00 TB
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/sdc [SAT], found in smartd database: Seagate Barracuda 3.5
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_00], opened
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_00], [FUJITSU MBA3147RC D306], read id: 0x500000e1175a6dc0, S/N: BJA3PB20MLWR, 146 GB
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_00], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_01], opened
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_01], [FUJITSU MBA3147RC D306], read id: 0x500000e115830950, S/N: BJA3PA80KNT5, 146 GB
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_01], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04], type changed from 'megaraid,4' to 'sat+megaraid,4'
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], opened
Nov. 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], ST2000DM001-1ER164, S/N:Z4Z2CWV3, WWN:5-000c50-07ac2d42f, FW:CC25, 2.00 TB
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], found in smartd database: Seagate Barracuda 7200.14 (AF)
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], not capable of SMART Health Status check
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05], type changed from 'megaraid,5' to 'sat+megaraid,5'
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], opened
Nov. 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], ST2000DM001-1ER164, S/N:Z4Z2W969, WWN:5-000c50-07b6ff45e, FW:CC26, 2.00 TB
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], found in smartd database: Seagate Barracuda 7200.14 (AF)
Nov 08 19:29:05 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], not capable of SMART Health Status check
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06], type changed from 'megaraid,6' to 'sat+megaraid,6'
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], opened
Nov. 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], ST3000VN007-2AH16M, S/N:ZGY6LKFM, WWN:5-000c50-0c48b3476, FW:SC60, 3.00 TB
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], found in smartd database: Seagate IronWolf
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], not capable of SMART Health Status check
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07], type changed from 'megaraid,7' to 'sat+megaraid,7'
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], opened
Nov. 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], ST3000VN007-2AH16M, S/N:ZGY6KHBC, WWN:5-000c50-0c47c70df, FW:SC60, 3.00 TB
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], found in smartd database: Seagate IronWolf
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], not capable of SMART Health Status check
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], is SMART capable. Adding to "monitor" list.
Nov 08 19:29:06 mana.kervao.fr smartd[16392]: Monitoring 5 ATA/SATA, 2 SCSI/SAS and 0 NVMe devices


We therefore find all the disks of the server, it will monitor 5 SATA disks, 4 internal ones which constitute the RAID 5 of data, the external backup disk and the 2 SAS disks which constitute the RAID 1 system. A systemctl status smartd will give

systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
   Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-11-08 07:54:41 CET; 11h ago
 Main PID: 1089 (smartd)
    Tasks: 1 (limit: 4915)
   Memory: 4.3M
   CGroup: /system.slice/smartd.service
           └─1089 /usr/sbin/smartd -n

Nov 08 08:24:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 113
Nov. 08 08:54:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 108
Nov. 08 09:24:43 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 107
Nov 08 09:54:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 106
Nov 08 10:54:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 106 to 105
Nov 08 11:24:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 105 to 104
Nov 08 12:54:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 104 to 103
Nov 08 14:54:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 103 to 102
Nov. 08 16:24:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 102 to 101
Nov. 08 17:24:42 predator.kervao.fr smartd[1089]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 101 to 102

To be able to test that everything works we will modify the line of /etc/smartd.conf like this

DEVICESCAN -o on -S on -s (S/../.././01|L/../../1/03) -m olivier -M test

So we will receive an email per disk with the subject SMART error (EmailTest) detected on host: mana which will contain (example for the external disk):

This message was generated by the smartd daemon running on:

   host name: mana
   DNS domain: kervao.fr

The following warning/error was logged by the smartd daemon:

TEST EMAIL from smartd for device: /dev/sdc [SAT]

Device info:
ST4000DM004-2CV104, S/N:WFN0VFTR, WWN:5-000c50-0be69c167, FW:0001, 4.00 TB

For details see host's SYSLOG.


[ Back to top of page ]

Run a one-time test with smartctl

On the other hand, on PCs that are not permanently on, you can occasionally run the disk health check commands. In practice, if I take as an example my internal disk identified by the special file /dev/sdc , you must first type the command smartctl -t short /dev/sdc , here is the result:

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.7.19-desktop-3.mga7] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Mon Nov 9 18:48:33 2020

Use smartctl -X to abort test.

Then two minutes later we type smartctl -l selftest /dev/sdc and here is the result:

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.7.19-desktop-3.mga7] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 326 -
# 2 Extended offline Completed without error 00% 17 -

It's all good, there is no mistake.

To automate all this, I suggest you use anacron .

On the other hand, for a disk connected via USB (via an adapter or a case), you have the choice of putting as a variable behind -d sat, usbsunplus , usbcypress and usbjmicron . In practice, you will have to type lsusb , you will find your hard disk connected via USB, for example with a SATA/USB adapter

Bus 002 Device 010: ID 7825:a2a4 ULT-Best Best USB Device

Now you have to go to this page and see which device your adapter or enclosure corresponds to, be careful not all devices are listed, so you will have to do tests with all the devices listed above, for my part it was -d sat for both the SATA/USB adapter and the USB enclosure. The command to type for a short test will therefore be (if the special file is /dev/sde )

smartctl -t short -d sat /dev/sde

Now let's move on to the long test, for a disk of the RAID 5 of my server (the 7th on the chain), we will type

smartctl -t long -d megaraid,7 -a /dev/sdb

here is the result

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.6.6-server-1.mga7] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST3000VN007-2AH16M
Serial Number: ZGY6KHBC
LU WWN Device Id: 5 000c50 0c47c70df
Firmware Version: SC60
User Capacity: 3 000 592 982 016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5980 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Nov 21 09:36:58 2020 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection: (601) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediately.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    order.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time: (1) minutes.
Extended self-test routine
recommended polling time: (525) minutes.
Conveyance self-test routine
recommended polling time: (2) minutes.
SCT capabilities: (0x50bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f 079 064 044 Pre-fail Always - 78583125
  3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
  4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 8
  5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
  7 Seek_Error_Rate 0x000f 085 060 045 Pre-fail Always - 345234219
  9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 6111 (71 225 0)
 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 8
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032835
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 073 040 Old_age Always - 18 (Min/Max 15/25)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 075 075 000 Old_age Always - 50405
194 Temperature_Celsius 0x0022 018 040 000 Old_age Always - 18 (0 14 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 193 193 000 Old_age Always - 25
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1894 (157 172 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 17183414698
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 6251948358

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6111 -
# 2 Short offline Completed without error 00% 6102 -
# 3 Short offline Completed without error 00% 6078 -
# 4 Short offline Completed without error 00% 6054 -
# 5 Short offline Completed without error 00% 6030 -
# 6 Short offline Completed without error 00% 6006 -
# 7 Extended offline Completed without error 00% 5990 -
# 8 Short offline Completed without error 00% 5982 -
# 9 Short offline Completed without error 00% 5958 -
#10 Short offline Completed without error 00% 5934 -
#11 Short offline Completed without error 00% 5910 -
#12 Short offline Completed without error 00% 5886 -
#13 Short offline Completed without error 00% 5862 -
#14 Short offline Completed without error 00% 5838 -

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 525 minutes for test to complete.
Test will complete after Sat Nov 21 18:21:59 2020

Use smartctl -X to abort test.

Basically it will last 525 minutes (or almost 9 hours for a 3TB disk!!). To see the progress of the order you will have to type regularly

smartctl -a -d megaraid,7 -a /dev/sdb

here is the result

Self-test execution status: (246) Self-test routine in progress...
                    60% of test remaining.

In the system log file we also see progress indications

Nov 21 14:23:04 mana.kervao.fr smartd[6132]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], self-test in progress, 20% remaining

Now here is what it looks like with a disk that has lots of errors, it is an old disk that was in my RAID5.

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.7.19-desktop-3.mga7] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1ER164
Serial Number:    Z4Z2CEHK
LU WWN Device Id: 5 000c50 07ac2d868
Firmware Version: CC25
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Nov 22 19:08:14 2020 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (   80) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time: (1) minutes.
Extended self-test routine
recommended polling time: (208) minutes.
Conveyance self-test routine
recommended polling time: (2) minutes.
SCT capabilities: (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f 100 094 006 Pre-fail Always - 138697940
  3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
  4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 68
  5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
  7 Seek_Error_Rate 0x000f 065 060 030 Pre-fail Always - 3660043
  9 Power_On_Hours 0x0032 059 059 000 Old_age Always - 36617
 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 69
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 088 088 000 Old_age Always - 12
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 080 066 045 Old_age Always - 20 (Min/Max 20/20)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9
193 Load_Cycle_Count 0x0032 007 007 000 Old_age Always - 187614
194 Temperature_Celsius 0x0022 020 040 000 Old_age Always - 20 (0 7 0 0 0)
197 Current_Pending_Sector 0x0012 099 099 000 Old_age Always - 272
198 Offline_Uncorrectable 0x0010 099 099 000 Old_age Offline - 272
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 20860h+08m+08.480s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2621623756
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 7556213911

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49,710 days.

Error 12 occurred at disk power-on lifetime: 34776 hours (1449 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- --
  40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  60 00 6e ff ff ff 4f 00 34d+16:52:44.064 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:44.006 READ LOG EXT
  60 00 6f ff ff ff 4f 00 34d+16:52:39.564 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:39.506 READ LOG EXT
  60 00 70 ff ff ff 4f 00 34d+16:52:35.063 READ FPDMA QUEUED

Error 11 occurred at disk power-on lifetime: 34776 hours (1449 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  60 00 6f ff ff ff 4f 00 34d+16:52:39.564 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:39.506 READ LOG EXT
  60 00 70 ff ff ff 4f 00 34d+16:52:35.063 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:35.005 READ LOG EXT
  60 00 71 ff ff ff 4f 00 34d+16:52:30.563 READ FPDMA QUEUED

Error 10 occurred at disk power-on lifetime: 34776 hours (1449 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- --
  40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  60 00 70 ff ff ff 4f 00 34d+16:52:35.063 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:35.005 READ LOG EXT
  60 00 71 ff ff ff 4f 00 34d+16:52:30.563 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:30.505 READ LOG EXT
  60 00 72 ff ff ff 4f 00 34d+16:52:26.062 READ FPDMA QUEUED

Error 9 occurred at disk power-on lifetime: 34776 hours (1449 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- --
  40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  60 00 71 ff ff ff 4f 00 34d+16:52:30.563 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:30.505 READ LOG EXT
  60 00 72 ff ff ff 4f 00 34d+16:52:26.062 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:26.005 READ LOG EXT
  60 00 73 ff ff ff 4f 00 34d+16:52:21.562 READ FPDMA QUEUED

Error 8 occurred at disk power-on lifetime: 34776 hours (1449 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
  -- -- -- -- -- -- -- -- ---------------- --------------------
  60 00 72 ff ff ff 4f 00 34d+16:52:26.062 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:26.005 READ LOG EXT
  60 00 73 ff ff ff 4f 00 34d+16:52:21.562 READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00 34d+16:52:21.504 READ LOG EXT
  60 00 74 ff ff ff 4f 00 34d+16:52:17.061 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 
[ Back to FUNIX home page ]