qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables)


From: Gabriel L. Somlo
Subject: [Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables)
Date: Wed, 12 Mar 2014 17:55:30 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Mar 12, 2014 at 02:24:54PM +0100, Gerd Hoffmann wrote:
> On Mi, 2014-03-12 at 09:05 -0400, Gabriel L. Somlo wrote:
> > On Wed, Mar 12, 2014 at 09:27:18AM +0100, Gerd Hoffmann wrote:
> > > I think we should just use e820_table (see pc.c) here.  Loop over it and
> > > add a type 19 table for each ram region in there.
> > 
> > I'm assuming this should be another post-Seabios-compatibility patch,
> > at the end of the series, and I should still do the (start,size)
> > arithmetic cut'n'pasted from SeaBIOS first, right ?
> 
> You should get identical results with both methods.  It's just that the
> e820 method is more future proof, i.e. if the numa people add support
> for non-contignous memory some day we don't have to adapt the smbios
> code to handle it.

So I spent some time reverse-engineering the way Type 16..20 (memory)
smbios tables are built in SeaBIOS, and therefore in the QEMU smbios
patch set currently under revision... And I came up with the following
picture (caution: ascii art, fixed-width font strongly recommended):

 ----------------------------------------------------------------------------
|                               Type16  0x1000                               |
 ----------------------------------------------------------------------------
 ^             ^               ^           ^                    ^           ^
 |             |               |           |                    |           |
 |         ----+---        ----+----   ----+----       ---------+--------   |
 |        | Type17 |      | Type17  | | Type17  |     | Type17           |  |
 |        | 0..16G |      | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
 |        | 0x1100 |      | 0x1101  | | 0x1102  |     | 0x110<N>         |  |
 |         --------        ---------   ---------       ------------------   |
 |          ^   ^              ^           ^                    ^           |
 |          |   |              |           |                    |           |
 |       +--+   +--+           |           |                    |           |
 |       |         |           |           |                    |           |
 |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
 |  | Type20 | | Type20 | | Type20  | | Type20  |     | Type20           |  |
 |  | 0..4G  | | 4..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
 |  | 0x1400 | | 0x1401 | | 0x1402  | | 0x1403  |     | 0x140<N+1>       |  |
 |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
 |       |         |           |           |                    |           |
 |       |         |           +-------+   |   +----------------+           |
 |       |         +----------------+  |   |   |                            |
 |       |                          |  |   |   |                            |
 |       v                          v  v   v   v                            |
 |   --------                      --------------                           |
 |  | Type19 |                    | Type19       |                          |
 |  | 0..4G  |                    | 4G..ram_size |                          |
 |  | 0x1300 |                    | 0x1301       |                          |
 |   ----+---                      ------+-------                           |
 |       |                               |                                  |
 +-------+                               +----------------------------------+

Here are some of the limit values, and some questions and thoughts:

- Type16 max == 2T - 1K;

Should we just assert((ram_size >> 10) < 0x80000000), and officially
limit guests to < 2T ?

- Type17 max == 32G - 1M;

This explains why we create Type17 device tables in increments of 16G,
since that's the largest possible value that's a nice, round power of
two :)

- Type19 & Type20 max == 4T - 1K;

If we limit ourselves to what Type16 can currently represent (2T),
this should be plenty enough to work with...

So, currently, we split available ram into blobs of up to 16G each,
and assign each blob a Type17 node.

We then split available ram into <4G and 4G+, and create up to two
Type19 nodes for these two areas.

Now, re. e820: currently, the expectation is that the (up to) two
Type19 nodes in the above figure correspond to (up to) two entries of
type E820_RAM in the e820 table.


Then, a type20 node is assigned to the sub-4G portion of the first
Type17 "device", and another type20 node is assigned to the over-4G
portion of the same.

>From then on, type20 nodes correspond to the rest of the 16G-or-less
type17 devices pretty much on a 1:1 basis.


If the e820 table will contain more than just two E820_RAM entries,
and therefore we'll have more than the two Type19 nodes on the bottom
row, what are the rules for extending the rest of the figure
accordingly (i.e. how do we hook together more Type17 and Type20 nodes
to go along with the extra Type19 nodes) ?

Thanks much,
--Gabriel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]