searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

存储集群关键组件硬盘SMART信息采集

2023-06-21 01:50:19
71
0

什么是硬盘的SMART

SMART:是Self-Monitoring Analysis And Reporting Technology的缩写,即自我检测分析与报告技术,成为一种自动监控硬盘驱动器完好状况和报告潜在问题的技术标准。作为技术标准,SMART规定了硬盘制造厂商应遵循的标准,规范主要如下:

1.在硬盘生产阶段完成SMART的属性和参数设定;

2.允许用户开启或关闭SMART功能;

3.在用户使用硬盘过程中,可以根据获取到SMART信息有效判断硬盘的状态,并做出相应的措施或警告;

 

SMART信息检测了哪些属性?

SMART技术标准定义了常用的硬盘检测参数:

SMART ID(十六进制)

SMART ID(十进制)

ID代表的参数

01

001

底层数据读取错误率 Raw Read Error Rate

04

004

启动/停止计数 Start/Stop Count

05

005

重映射扇区数 Relocated Sector Count

09

009

通电时间累计 Power-On Time Count (POH)

0A

010

主轴起旋重试次数(即硬盘主轴电机启动重试次数) Spin up Retry Count

0B

011

磁盘校准重试次数 Calibration Retry Count

0C

012

磁盘通电次数 Power Cycle Count

C2

194

温度 Temperature

 

C7

199

ULTRA DMA奇偶校验错误率 ULTRA ATA CRC Error Rate

C8

200

写错误率 Write Error Rate

。。。。

。。。。。

。。。。。

SMART信息怎么查询:

在Linux系统上提供了smartctl工具,可以查看硬盘的基础信息,包括Device Model,SN,Firmware, SMART信息等;

[~#]smartctl -a /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: HGST HUS728T8TALE6L0
Serial Number: VYG973GS
LU WWN Device Id: 5 000cca 0c6c43209
Add. Product Id: DELL(tm)
Firmware Version: V8DERT06
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Mar 2 20:42:48 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 943) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5395 -
# 2 Short offline Completed without error 00% 15 -
# 3 Short offline Completed without error 00% 15 -
# 4 Vendor (0xdf) Completed without error 00% 12 -

# 5 Short offline Completed without error 00% 11 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

smartctl执行结果解读:

菜单栏参数说明:

ATTRIBUTE_NAME:SMART信息的检测项

FLAG:检测项的操作标志

Value:当前值,硬盘运行时实时根据内部公式实际计算出来的值,计算公式由硬盘厂家自行定义,数值会不断刷新;

Worst: 最差值,硬盘运行时各检测项曾出现过的最大的非正常值,数值会实时情况不断刷新;

Threshold:临界值,硬盘厂商指定的表示某一检测项的可靠性门限值,当前值接近阈值时,表示硬盘将变得不可靠,可能出现故障或者数据丢失;

RAW_VALUE:原始值

TYPE:检测项的类型(Pre-fail或Oldage),Pre-fail类型的属性是一个关键属性,表示磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性异常,那磁盘将要发生故障。另一方面,Oldage类型的属性为非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障,只是读写过程中产生的正常磨损。

UPDATED:表示检测项的更新频率,Offline代表磁盘上执行离线测试的时间。

WHEN_FAILED:如果VALUE<=THRESHOLD,会被设置成“FAILING_NOW”;如果WORST<=THRESH会被设置成“In_the_past”;正常情况下会被设置成“-”。

“FAILING_NOW”情况下,需要尽快备份重要文件,尤其属性是Pre-fail类型时;

“In_the_past”代表属性已经故障了,但在运行测试的时候没问题;

“-”代表这个属性从没故障过。

 

0条评论
作者已关闭评论
w****n
8文章数
1粉丝数
w****n
8 文章 | 1 粉丝
原创

存储集群关键组件硬盘SMART信息采集

2023-06-21 01:50:19
71
0

什么是硬盘的SMART

SMART:是Self-Monitoring Analysis And Reporting Technology的缩写,即自我检测分析与报告技术,成为一种自动监控硬盘驱动器完好状况和报告潜在问题的技术标准。作为技术标准,SMART规定了硬盘制造厂商应遵循的标准,规范主要如下:

1.在硬盘生产阶段完成SMART的属性和参数设定;

2.允许用户开启或关闭SMART功能;

3.在用户使用硬盘过程中,可以根据获取到SMART信息有效判断硬盘的状态,并做出相应的措施或警告;

 

SMART信息检测了哪些属性?

SMART技术标准定义了常用的硬盘检测参数:

SMART ID(十六进制)

SMART ID(十进制)

ID代表的参数

01

001

底层数据读取错误率 Raw Read Error Rate

04

004

启动/停止计数 Start/Stop Count

05

005

重映射扇区数 Relocated Sector Count

09

009

通电时间累计 Power-On Time Count (POH)

0A

010

主轴起旋重试次数(即硬盘主轴电机启动重试次数) Spin up Retry Count

0B

011

磁盘校准重试次数 Calibration Retry Count

0C

012

磁盘通电次数 Power Cycle Count

C2

194

温度 Temperature

 

C7

199

ULTRA DMA奇偶校验错误率 ULTRA ATA CRC Error Rate

C8

200

写错误率 Write Error Rate

。。。。

。。。。。

。。。。。

SMART信息怎么查询:

在Linux系统上提供了smartctl工具,可以查看硬盘的基础信息,包括Device Model,SN,Firmware, SMART信息等;

[~#]smartctl -a /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: HGST HUS728T8TALE6L0
Serial Number: VYG973GS
LU WWN Device Id: 5 000cca 0c6c43209
Add. Product Id: DELL(tm)
Firmware Version: V8DERT06
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Mar 2 20:42:48 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 943) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5395 -
# 2 Short offline Completed without error 00% 15 -
# 3 Short offline Completed without error 00% 15 -
# 4 Vendor (0xdf) Completed without error 00% 12 -

# 5 Short offline Completed without error 00% 11 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

smartctl执行结果解读:

菜单栏参数说明:

ATTRIBUTE_NAME:SMART信息的检测项

FLAG:检测项的操作标志

Value:当前值,硬盘运行时实时根据内部公式实际计算出来的值,计算公式由硬盘厂家自行定义,数值会不断刷新;

Worst: 最差值,硬盘运行时各检测项曾出现过的最大的非正常值,数值会实时情况不断刷新;

Threshold:临界值,硬盘厂商指定的表示某一检测项的可靠性门限值,当前值接近阈值时,表示硬盘将变得不可靠,可能出现故障或者数据丢失;

RAW_VALUE:原始值

TYPE:检测项的类型(Pre-fail或Oldage),Pre-fail类型的属性是一个关键属性,表示磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性异常,那磁盘将要发生故障。另一方面,Oldage类型的属性为非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障,只是读写过程中产生的正常磨损。

UPDATED:表示检测项的更新频率,Offline代表磁盘上执行离线测试的时间。

WHEN_FAILED:如果VALUE<=THRESHOLD,会被设置成“FAILING_NOW”;如果WORST<=THRESH会被设置成“In_the_past”;正常情况下会被设置成“-”。

“FAILING_NOW”情况下,需要尽快备份重要文件,尤其属性是Pre-fail类型时;

“In_the_past”代表属性已经故障了,但在运行测试的时候没问题;

“-”代表这个属性从没故障过。

 

文章来自个人专栏
存储底座
8 文章 | 1 订阅
0条评论
作者已关闭评论
作者已关闭评论
0
0