grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [PATCH] BugFix: grub menu gets stuck due to unseriali


From: 段亚勇
Subject: Re: [External] Re: [PATCH] BugFix: grub menu gets stuck due to unserialized rdtsc
Date: Wed, 8 Jan 2025 02:05:26 -0500

Hi Daniel, 
Sorry for disturbing you again!
May I know the ETA of my two patches arriving at grub master branch?
Because we recognized that debian 12.10 version is going to be planned in debian community.
We sincerely hope we can use this bugfix in debian 12.10.

------------------------------------------------------------------- 
Best Regards, 
Will Duan  Data-SYS-STE-OS
3F, T2B, Xinjiangwan Square, Douyin, Yangpu District, Shanghai
------------------------------------------------------------------- 
From: "段亚勇"<duanyayong@bytedance.com>
Date: Fri, Dec 20, 2024, 17:02
Subject: Re: [External] Re: [PATCH] BugFix: grub menu gets stuck due to unserialized rdtsc
To: "Daniel Kiper"<dkiper@net-space.pl>
Got it. Thanks for your help Daniel!

------------------------------------------------------------------- 
Best Regards, 
段亚勇  Data-SYS-STE-操作系统 
上海市杨浦区抖音新江湾广场 T2B 3F
------------------------------------------------------------------- 
From: "Daniel Kiper"<dkiper@net-space.pl>
Date: Fri, Dec 20, 2024, 03:49
Subject: [External] Re: [PATCH] BugFix: grub menu gets stuck due to unserialized rdtsc
To: "Duan Yayong"<duanyayong@bytedance.com>
On Mon, Dec 09, 2024 at 02:48:32PM +0800, Duan Yayong wrote:
> This patch is used to fix grub menu gets stuck in server
> AC poweron/poweroff stress test of x86_64, which is reproduced with 1/200
> ratio. The root cause analysis as below:
>
> Q:
> What's the code logic?
> A:
> "grub_tsc_init" function will init tsc by setting grub_tsc_rate,
> which call stack is:
>
>   grub_tsc_init -> grub_tsc_calibrate_from_pmtimer -> grub_divmod64
>
> Among, "grub_divmod64" function needs "tsc_diff" as the second parameter.
> In "grub_pmtimer_wait_count_tsc", we will call "grub_get_tsc" function to
> get time stamp counter value to assign to "start_tsc" variable, and get
> into "while(1)" loop space to get "end_tsc" variable value with same
> function, after 3580 ticks, return "end_tsc - start_tsc".
> Actually, "rdtsc" instruction will be called in "grub_get_tsc",
> but "rdtsc" instruction is not reliable(for the reason see the next
> question), which will cause "tsc_diff" to be a very big number larger
> than (1UL << 32) or a negative number, so that grub_tsc_rate will be zero.
> When "run_menu" function is startup, and calls "grub_tsc_get_time_ms"
> function to get current time to check if timeout time reach, at this time,
> "grub_tsc_get_time_ms" function will return zero due to zero
> "grub_tsc_rate" variable, then grub menu gets stuck...
>
> Q:
> What's the difference between rdtsc and rdtscp instructions
> in x86_64 architecture? Here is more explanations from
> Intel® 64 and IA-32 Architectures Software Developer’s Manual
> Volume 2B: https://cdrdv2.intel.com/v1/dl/getContent/671241.

I would suggest to add document version number next time.

> A:
> In page 4-558 -> RDTSC—Read Time-Stamp Counter:
>
> The RDTSC instruction is not a serializing instruction.
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
> Similarly, subsequent instructions may begin execution before
> the read operation is performed. The following items may guide
> software seeking to order executions of RDTSC:
>   - If software requires RDTSC to be executed only after all previous
>     instructions have executed and all previous loads are globally visible,
>     it can execute LFENCE immediately before RDTSC.
>   - If software requires RDTSC to be executed only after all previous
>     instructions have executed and all previous loads and stores are globally
>     visible, it can execute the sequence MFENCE;LFENCE immediately before
>     RDTSC.
>   - If software requires RDTSC to be executed prior to execution of any
>     subsequent instruction (including any memory accesses), it can execute
>     the sequence LFENCE immediately after RDTSC.
>
> A:
> In page 4-560 -> RDTSCP—Read Time-Stamp Counter and Processor ID:
>
> The RDTSCP instruction is not a serializing instruction,
> but it does wait until all previous instructions have executed and all
> previous loads are globally visible. 1 But it does not wait for previous
> stores to be globally visible, and subsequent instructions may begin
> execution before the read operation is performed. The following items
> may guide softwareseeking to order executions of RDTSCP:
>   - If software requires RDTSCP to be executed only after all previous
>     stores are globally visible, it can execute MFENCE immediately before
>     RDTSCP.
>   - If software requires RDTSCP to be executed prior to execution of any
>     subsequent instruction (including any memory accesses), itcan execute
>     LFENCE immediately after RDTSCP.
>
> Q:
> Why there is a cpuid serilizing instruction before rdtsc instruction,
> but "grub_get_tsc" still cannot work as expect?
>
> A:
> >From Intel® 64 and IA-32 Architectures Software Developer's
> Manual Volume 2A: Instruction Set Reference, A-L:
> https://cdrdv2.intel.com/v1/dl/getContent/671199

Ditto...

> In page 3-222 -> CPUID—CPU Identification:
> "CPUID" can be executed at any privilege level to serialize instruction execution.
> Serializing instruction execution guarantees that any modifications to flags,
> registers, and memory for previous instructions are completed before
> the next instruction is fetched and executed.
>
> So we only kept the instruction "rdtsc" and its previous instruction in order
> currently. But it is still out-of-order possibility between rdtsc
> instruction and its subsequent instruction.
>
> Q:
> Why do we do this fix?
>
> A:
> In the one hand, Add "cpuid" instruction after "rdtsc" instruction
> to make sure "rdtsc" instruction to be executed prior to execution
> of any subsequent instruction, about serializing execution that all
> previous instructions have been executed before "rdtsc", there is a
> "cpuid" usage in original code.
> In the other hand, using "cpuid" instruction ranther than "lfence"
> can make sure a forward compatibility for previous HW.
>
> Base this fix, we did 1500 cycles power on/off stress test, and
> did not reproduce this issue again.
>
> Signed-off-by: Duan Yayong <duanyayong@bytedance.com>
> Signed-off-by: Li Yongqiang <liyongqiang@huaqin.com>
> Signed-off-by: Sun Ming <simon.sun@huaqin.com>

Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>

> ---
>  include/grub/i386/tsc.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/grub/i386/tsc.h b/include/grub/i386/tsc.h
> index 947e62fa1..1814704fc 100644
> --- a/include/grub/i386/tsc.h
> +++ b/include/grub/i386/tsc.h
> @@ -47,6 +47,12 @@ grub_get_tsc (void)
>    /* Read TSC value.  We cannot use "=A", since this would use
>       %rax on x86_64. */
>    asm volatile ("rdtsc":"=a" (lo), "=d" (hi));
> +  /* Add CPUID instruction after rdtsc instruction to ensure
> +   * rdtsc instruction's execution is prior to subsequent
> +   * instruction to avoid out-of-order bewteen rdtsc and
> +   * its subsequent instruction.
> +   */
> +  grub_cpuid (0,a,b,c,d);

Wrong coding style for comment and code. I will fix both for you this time...

Daniel

reply via email to

[Prev in Thread] Current Thread [Next in Thread]