poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SUGGESTION] Pretty-printing custom unit types


From: Jose E. Marchesi
Subject: Re: [SUGGESTION] Pretty-printing custom unit types
Date: Mon, 11 Jul 2022 22:46:38 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

> On Mon, Jul 11, 2022 at 03:53:14PM +0200, Jose E. Marchesi wrote:
>> 
>> > On Fri, Jul 08, 2022 at 08:43:06PM +0200, Jose E. Marchesi wrote:
>> >> 
>> >> > Here it prints #32 instead of #U32bits.
>> >> >
>> >> > If how_many was an offset<int, B> it would however print how_many=0x0#B
>> >> >
>> >> 
>> > Hmm, maybe I'm not understanding it correctly, but would this correctly 
>> > handle
>> >  multiple units with the same size being interleaved in a struct?
>> 
>> No, it would not.
>> 
>> > I would have thought that to cover this comprehensively we would have to 
>> > add a
>> >  tag field to the unit type struct, but I'm happy to stand corrected if you
>> >  have a more elegant solution. :-)
>> 
>> Hmm, so you are suggesting to expand both the boxed offset PVM values
>> _and_ the boxed offset types PVM values in order to hold an unit name?
>> 
>
> Yes, kind of. I don't know how unit types are currently represented in the 
> runtime.
>
> unit types:
>
> Units are only allowed to be initialized with a constant integer literal.
>
> Their names are also constant, and I'd argue that nominal typing of
> units is fine given that both names and values are constant; I don't
> see the point of actually keeping track of scoped unit type
> declarations when they are for all intents and purposes equivalent as
> far as I can tell.

Units are lexically scoped.  You can re-define a unit with a different
value in an inner scope.

> This would let us "intern" the (unit name * bit size) tuples,
> deduplicating the allocations.
>
> Offset types are trickier because I'm guessing that the "container"
> type in offset<CONTAINER,UNIT> is parameterizable/scope-dependent
> (which causes the need for allocations alleviated to in your
> comment?), and if that is the case I agree that adding new fields
> would be unfortunate because it would increase memory consumption and
> trash our cache.

PVM offset values are boxed values with:

  struct pvm_off
  {
    pvm_val base_type;
    pvm_val magnitude;
    pvm_val unit;
  };

PVM type values for offsets are derived from offset values whenever
necessary and reflect:

  struct pvm_type
  {
    enum pvm_type_code code;
    union
    {
      ...
      struct
      {
        pvm_val base_type;
        pvm_val unit;
      } off;
      ...
    } val;
  }

This means we would need to expand the pvm_type for offsets to have an
unit name (as a PVM string):

      struct
      {
        pvm_val base_type;
        pvm_val unit;
        pvm_val unit_name; /* PVM_NULL for no name.  */
      } off;

but then we would need to explicitly tag the PVM offset values with the
type, like we do for arrays and structs:

    struct pvm_off
    {
      pvm_val type;
      pvm_val magnitude;
    };

Note how this may actually result on less storage used, since many PVM
offsets will now share an explicit type value with the unit and
base_type.

When the compiler generates code that creates offset values, it can then
use the unit name used in the offset literal.

So, when the user write 23#Foo, the created offset value will have type
offset<int<32>,Foo> (with the unit name.)

Likewise, with type MyType = offset<int<32>,Foo>, etc.

> Some ideas I think are worth considering in that case:
>  1) We could keep a table in the environment mapping from offset type pointer 
> to unit/unit name.
>     This would let us keep the pointer in the offset type (to keep the actual 
> allocation small)
>     while still letting us access the unit name when 
> pretty-printing/enumerating/complaining about errors.
>     The runtime cost would be increased memory usage and increased 
> bookkeeping when allocating/deallocating offsets.
>
>  2) Interning offset types, too, would reduce the size of such a table.
>     I'm not sure how practical this is / how prone to changes offsets are 
> from changing variables etc?
>
>  3) Another idea, that I like more, would be to limit the maximum unit size 
> (currently uint64_t?) in favor
>     of storing a [unit name tag] (an offset into a global unit name string 
> table) in the upper bits.
>     Again, since unit names are constant strings that would need to be loaded 
> from source code,
>     I think it would be "enough for everybody" with a global limit of e.g. 
> 4096 distinct unit names,
>     limiting our units to 64-12 bits. Do we have a practical use for units 
> larger than 2^52 bits (~4 petabytes)?
>     Then we wouldn't need extra fields, and we'd still be able to access the 
> size without chasing pointers.
>     We'd need a bit mask on access, and a tiny bit of hash table bookkeping 
> on allocation, but that seems reasonable?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]