[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Hacky MTRR support

From: Vladimir 'φ-coder/phcoder' Serbinenko
Subject: Re: [RFC] Hacky MTRR support
Date: Mon, 28 Jun 2010 11:14:25 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100515 Icedove/3.0.4

On 06/25/2010 10:58 AM, Colin Watson wrote:
> I recently posted ("Subject: [PATCH] Optimise memset on i386" - sorry, I
> don't seem to have a route to at the moment so I can't
> post an archive link) about optimising GRUB's video initialisation, and
> hinted that it might be possible to do better by implementing MTRRs as
> well in order to allow the system to combine writes to video memory
> rather than taking a cache stall for every single write.  I can report
> that, at least on the hardware I was using, it does make a significant
> difference: filling the screen with solid colour now takes 10
> milliseconds rather than 160!  This ended up shaving about a second off
> the boot time of the project I'm working on.
> On Linux, you can tell whether this is likely to be the case by
> comparing the kernel log from startup with /proc/mtrr.  For example, on
> my Dell Latitude D830 with BIOS version A02, the kernel log says:
>   MTRR variable ranges enabled:
>     0 base 000000000 mask F80000000 write-back
>     1 base 080000000 mask FC0000000 write-back
>     2 base 0BF800000 mask FFF800000 uncachable
>     3 base 0BF700000 mask FFFF00000 uncachable
>     4 disabled
>     5 disabled
>     6 disabled
>     7 disabled
> ... while /proc/mtrr says:
>   reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
>   reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
>   reg02: base=0x0bf800000 ( 3064MB), size=    8MB, count=1: uncachable
>   reg03: base=0x0bf700000 ( 3063MB), size=    1MB, count=1: uncachable
>   reg04: base=0x0e0000000 ( 3584MB), size=  256MB, count=1: write-combining
> The extra write-combining entry there matches the video memory aperture
> (you can check this by comparing with /proc/iomem and 'lspci -vvnn'),
> and I think it's set up by the X driver.
> The following patch is against 1.98, since that's what I was developing
> on.  I'm not posting this as a serious merge proposal right now so I
> haven't included a ChangeLog or anything; this is just an RFC, and might
> be useful for others interested in this kind of thing (Seth Goldberg
> commented on IRC that he had been looking into something similar).
> Flaws in this approach include:
>   * Doesn't work with anything other than the generic Intel MTRR system
>     (although modern AMD chips have this too)
>   * Very simplistic MTRR handling: anything other than a card with a
>     power-of-two memory region aligned on a same-power-of-two boundary
>     will degrade to previous behaviour
>   * Might break older cards where write-combining isn't permitted (I
>     found an Intel paper saying such cards existed); in particular it's
>     possible that some cards might put registers in that region
>   * Entirely unconfigurable
> It's also probably in the wrong place in the tree; I imagine EFI drivers
> would want to do much the same thing, for example - and my patch has far
> too many magic constants.  I did at least take care to disable any MTRR
> added by GRUB before starting the target kernel; who knows what effects
> that might have, especially on multiprocessor systems (although Linux
> does work around out-of-sync MTRRs across CPUs).
> Still, I'd like to know whether this is of general enough interest to
> merit polishing it up and maybe offering it as some kind of configurable
> option.  I do think that this is a BIOS bug, but it seems to be a not
> uncommon one and it does have a pretty noticeable effect on GRUB's video
> performance.
In GRUB the stability is more important than speed. So blindly enabling
this on all cards would be bad. Perhaps it's better to enable MTRR only
with native drivers or at least have PCIID whitelist.
It's worth doing but we have to be cautious about it
x86 is particular in that uncached and buffered addresses are the same.
On MIPS it's not the case.
I think it's better to add an argument to
> +#define cpuid(num,a,b,c,d) \
> +  asm volatile ("xchgl %%ebx, %1; cpuid; xchgl %%ebx, %1" \
> +                : "=a" (a), "=r" (b), "=c" (c), "=d" (d)  \
> +                : "0" (num))
> +
We already have cpuid functions in tsc.h. I've seen other uncleannesses
but as you said it's meant only as a demo

Vladimir 'φ-coder/phcoder' Serbinenko

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]