[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [PATCH] hw/block/nvme: add smart_critical_warning pro

From: zhenwei pi
Subject: Re: [External] Re: [PATCH] hw/block/nvme: add smart_critical_warning property
Date: Mon, 11 Jan 2021 17:49:18 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 1/11/21 5:21 PM, Klaus Jensen wrote:
On Jan 11 10:14, Philippe Mathieu-Daudé wrote:
On 1/11/21 8:50 AM, zhenwei pi wrote:
There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.

For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.

Test with this patch:
1, append 'smart_critical_warning=16' for nvme parameters.
2, run smartctl in guest
  #smartctl -H -l error /dev/nvme0n1

   SMART overall-health self-assessment test result: FAILED!
   - volatile memory backup device has failed

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
  hw/block/nvme.c | 4 ++++
  hw/block/nvme.h | 1 +
  2 files changed, 5 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 27d2c72716..2f0bcac91c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1215,6 +1215,8 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, 
uint32_t buf_len,
trans_len = MIN(sizeof(smart) - off, buf_len); + smart.critical_warning = n->params.smart_critical_warning;
      smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
      smart.data_units_written[0] = 
@@ -2824,6 +2826,8 @@ static Property nvme_props[] = {
      DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64),
      DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7),
      DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
+    DEFINE_PROP_UINT8("smart_critical_warning", NvmeCtrl,
+                      params.smart_critical_warning, 0),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a..76684f5ac0 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -16,6 +16,7 @@ typedef struct NvmeParams {
      uint32_t aer_max_queued;
      uint8_t  mdts;
      bool     use_intel_id;
+    uint8_t  smart_critical_warning;
  } NvmeParams;
typedef struct NvmeAsyncEvent {

This is an easy way to achieve your goal.

However a better way is to add a QMP command to
change NvmeCtrl->temperature.

See for example tmp105_initfn() in hw/misc/tmp105.c
and qmp_tmp105_set_temperature() in tests/qtest/tmp105-test.c.


+1 for this approach.

Using QMP command to change NvmeCtrl->temperature only triggers NVME_SMART_TEMPERATURE warning, it's OK to test the work flow of uplayer software, but it's not enough to test all the cases of each warning.

From NVMe version 1.3 to 1.4, a new bit definition has been added(bit 5, Persistent Memory Region has become read-only or unreliable). Before we really hit this warning on a physical disk, we can use QEMU to test this feature(maybe another new feature in the future).

I don't disagree "add a QMP command" solution, but I think QEMU should be able to emulate all of the warnings(not only temperature).

zhenwei pi

reply via email to

[Prev in Thread] Current Thread [Next in Thread]