[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SUGGESTION] Pretty-printing custom unit types
From: |
Jose E. Marchesi |
Subject: |
Re: [SUGGESTION] Pretty-printing custom unit types |
Date: |
Mon, 11 Jul 2022 22:46:38 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
> On Mon, Jul 11, 2022 at 03:53:14PM +0200, Jose E. Marchesi wrote:
>>
>> > On Fri, Jul 08, 2022 at 08:43:06PM +0200, Jose E. Marchesi wrote:
>> >>
>> >> > Here it prints #32 instead of #U32bits.
>> >> >
>> >> > If how_many was an offset<int, B> it would however print how_many=0x0#B
>> >> >
>> >>
>> > Hmm, maybe I'm not understanding it correctly, but would this correctly
>> > handle
>> > multiple units with the same size being interleaved in a struct?
>>
>> No, it would not.
>>
>> > I would have thought that to cover this comprehensively we would have to
>> > add a
>> > tag field to the unit type struct, but I'm happy to stand corrected if you
>> > have a more elegant solution. :-)
>>
>> Hmm, so you are suggesting to expand both the boxed offset PVM values
>> _and_ the boxed offset types PVM values in order to hold an unit name?
>>
>
> Yes, kind of. I don't know how unit types are currently represented in the
> runtime.
>
> unit types:
>
> Units are only allowed to be initialized with a constant integer literal.
>
> Their names are also constant, and I'd argue that nominal typing of
> units is fine given that both names and values are constant; I don't
> see the point of actually keeping track of scoped unit type
> declarations when they are for all intents and purposes equivalent as
> far as I can tell.
Units are lexically scoped. You can re-define a unit with a different
value in an inner scope.
> This would let us "intern" the (unit name * bit size) tuples,
> deduplicating the allocations.
>
> Offset types are trickier because I'm guessing that the "container"
> type in offset<CONTAINER,UNIT> is parameterizable/scope-dependent
> (which causes the need for allocations alleviated to in your
> comment?), and if that is the case I agree that adding new fields
> would be unfortunate because it would increase memory consumption and
> trash our cache.
PVM offset values are boxed values with:
struct pvm_off
{
pvm_val base_type;
pvm_val magnitude;
pvm_val unit;
};
PVM type values for offsets are derived from offset values whenever
necessary and reflect:
struct pvm_type
{
enum pvm_type_code code;
union
{
...
struct
{
pvm_val base_type;
pvm_val unit;
} off;
...
} val;
}
This means we would need to expand the pvm_type for offsets to have an
unit name (as a PVM string):
struct
{
pvm_val base_type;
pvm_val unit;
pvm_val unit_name; /* PVM_NULL for no name. */
} off;
but then we would need to explicitly tag the PVM offset values with the
type, like we do for arrays and structs:
struct pvm_off
{
pvm_val type;
pvm_val magnitude;
};
Note how this may actually result on less storage used, since many PVM
offsets will now share an explicit type value with the unit and
base_type.
When the compiler generates code that creates offset values, it can then
use the unit name used in the offset literal.
So, when the user write 23#Foo, the created offset value will have type
offset<int<32>,Foo> (with the unit name.)
Likewise, with type MyType = offset<int<32>,Foo>, etc.
> Some ideas I think are worth considering in that case:
> 1) We could keep a table in the environment mapping from offset type pointer
> to unit/unit name.
> This would let us keep the pointer in the offset type (to keep the actual
> allocation small)
> while still letting us access the unit name when
> pretty-printing/enumerating/complaining about errors.
> The runtime cost would be increased memory usage and increased
> bookkeeping when allocating/deallocating offsets.
>
> 2) Interning offset types, too, would reduce the size of such a table.
> I'm not sure how practical this is / how prone to changes offsets are
> from changing variables etc?
>
> 3) Another idea, that I like more, would be to limit the maximum unit size
> (currently uint64_t?) in favor
> of storing a [unit name tag] (an offset into a global unit name string
> table) in the upper bits.
> Again, since unit names are constant strings that would need to be loaded
> from source code,
> I think it would be "enough for everybody" with a global limit of e.g.
> 4096 distinct unit names,
> limiting our units to 64-12 bits. Do we have a practical use for units
> larger than 2^52 bits (~4 petabytes)?
> Then we wouldn't need extra fields, and we'd still be able to access the
> size without chasing pointers.
> We'd need a bit mask on access, and a tiny bit of hash table bookkeping
> on allocation, but that seems reasonable?