[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Evolution & optimization of the module system

From: Ludovic Courtès
Subject: Re: Evolution & optimization of the module system
Date: Mon, 19 Feb 2007 10:24:20 +0100
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)


Kevin Ryde <address@hidden> writes:

> address@hidden (Ludovic Courtès) writes:
>> (1) has to do mainly with `module-use!' vs. `module-use-interfaces!' (as
>> was discussed recently).  Namely the fact that duplicate processing is
>> not always performed, depending on whether one uses `module-use!' or
>> some other means to use a module.
> There's nothing broken though is there, it's just a trap for the
> unwary (like me for instance, wrestling with use-srfis)?

Indeed, but it's easy to get trapped.  ;-)

> I've put that off, because a lot looks like internals, and I'd
> wondered if r6rs might offer some of its own "introspection" stuff.

AFAICS, R6RS does not define any introspection or run-time module
facilities akin to `module-define!', `module-ref', etc.  It only
provides the `library' form.  So we may want to document our own
run-time module API since it's actually been used for some time,
including outside of Guile core.

(That said, R6RS module systems introduces a clean phase separation that
we may want to implement, but that's another story.)

>> Although duplicates should be the exception rather than the rule[*],
>> duplicate processing is pretty costly: the current `process-duplicates'
>> is roughly O(N*USES),
> USES is supposed to be smallish of course though.

Actually, `process-duplicates' is O(N*USES) _for each module used_.  So
the overall duplicate processing is really O(N*USES^2).  With the
patched version, the whole process is O(N*USES).  That can make quite a
difference when USES > 1.

>> Likewise,
>> variable lookup (e.g., in `module_variable ()') is O(USES).  I believe
>> that both may have a sensible impact on startup time.
> Something definitely slows down startup over 1.6.  I'd guessed it was
> strings, but I failed miserably at making either gprof or
> functioncheck give some call counts to prove or disprove that.

I think the module system hasn't changed much since 1.6.

>> The patch addresses this by changing the data structures used by
>> modules: instead of a list of used modules, it uses a second "obarray",
>> called the "import obarray", that maps symbols to the modules providing
>> them.
> That would increase the memory used by a module though, would it?  Big
> modules imported into lots of other places would have their binding
> list much copied would they?  (I'm thinking for instance of gtk and
> gnome, and thinking purely selfishly since I import gtk and/or gdk
> into a bunch of my modules.)

Right, it increases the size of individual module objects.  I haven't
made any measurements but I'm not sure whether it should be a concern,
given that the number of modules is not supposed to be too high (i.e.,
at most a few hundreds).  Any idea of an estimate of the memory occupied
by a (balanced) hash table given its number of elements?

OTOH, if we were to use the same data structure for all the phases of a
module (when we support phase separation), the impact would be more

>> This has several implications.  First, duplicate processing occurs the
>> same way for dynamically added bindings than for "statically imported"
>> bindings.  Second, it makes load-time-dependent duplicate policies such
>> as `last' and `first' irrelevant (since they are inherently
>> non-deterministic).
> Last and first still sound pretty sensible.

Yes, but they behave in unpredictable ways in the presence of
dynamically added bindings and dynamic duplicate processing (especially
as they interact with memoization).

>> Third, it makes dynamic addition of bindings relatively costly.
> Depends how often that happens I suppose.  There's no actual
> documented way to do it is there? :-)  Maybe some tomfoolery with
> macros and eval ...

I was referring to `module-define!' and similar (which, although not
documented, is actually used...).

>> From the measurements I've made, the new version is
>> around 40 times faster than the other one.
> Is that in part due to moving the variable lookup to C?

No, I was referring to `module-use-interfaces!' which does not involve
variable lookup (not using `module-variable').  It uses the now
C-implemented `module-import-interface' but the most important point is
the different algorithm.  So it's really a comparison of the O(N*USES^2)
vs. O(N*USES) discussed earlier.

>> So the question is: is the `beautify-user-module!' overhead compensated
>> by the variable lookup and duplicate processing gains?
> I'd be inclined to say probably not, but that maybe there's another
> way.

The measurements I made tend to tell the contrary, but further
measurements are certainly needed.

> Duplicates is trying to find the intersection between one set of
> symbols (from one module) and some reasonably smallish number N of
> other sets (from other modules) is it?  I wonder if there's some way
> to find that more efficiently.

Maybe, I dunno (at first sight, `lset-intersection' doesn't seem to be
particularly smart, though).

> I guess the `warn' in the default duplicates handling makes it
> necessary to find all the duplicates.  Nice enough feature, but
> without it there'd be no need to check any duplicates at all, or only
> check against the few `#:replace' lists.

Duplicate checking is very useful I think.  User-configurable duplicate
handling policies as available are maybe too flexible and too costly.
But I'm not sure we could drop their support in the next major release,
could we?

The patch is trying to be conservative by not removing any feature and
being as API-compatible as possible.  But if we decide that we can drop
duplicate handling policies, then we can probably envision different
solutions (for instance, enforcing a single policy that raises an error
when duplicate bindings are encountered).

> Maybe the worst afflicted programs should be advised to turn it off,
> or even turn it off globally when confident there's no accidental
> clashes.  And for a start I guess we don't need any duplicates work in
> the core, at least up until reaching the `--use-srfi' stage of
> startup, if that isn't already the case.

Depends on what you mean here.  Modules distributed with core Guile are
not special-cased (and shouldn't be, IMO).

>> +      (define %default-import-size
>> +        ;; This should be the size of the pre-module obarray.
>> +        500)
> Which also ends up being a minimum size of course ...

Yes, it should really be 2000 which is an upper bound of the number of
bindings in `the-scm-module'.

>> +/* Copy the given alist, i.e., duplicate all its pairs recursively.  */
>> +static inline SCM
>> +alist_copy (SCM alist)
> If you end up using this you could move scm_srfi1_alist_copy from
> srfi/srfi-1.c into the core (leaving behind a #define and a
> re-export).

Right.  OTOH, I like to have it in-line and non-type-checking.

>> -/*
>> -  TODO: should export this function? --hwn.
>> - */
>> -static SCM
>> -scm_export (SCM module, SCM namelist)
>> +SCM
>> +scm_module_export (SCM module, SCM namelist)
> That could be an "scm_i_" (for now at least) if it's not documented.

That was meant to be documented eventually.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]