Re: Dynamic loading progress

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Dynamic loading progress

From:	Eli Zaretskii
Subject:	Re: Dynamic loading progress
Date:	Mon, 16 Feb 2015 17:43:14 +0200
> Date: Sun, 15 Feb 2015 12:20:35 -0800
> From: Daniel Colascione <address@hidden>
> CC: address@hidden, address@hidden
> 
> Here's a broad outline of what I have in mind.

Thanks.  I think the next step is for someone to try rewriting a
couple of modules on the branch in terms of this specification, and
see if something is missing.  Volunteers are welcome.

Some comments on your write-up:

> When Emacs loads a module, it uses dlsym (or the platform equivalent)
> to find this routine and call it. If it returns 0, the module loaded
> successfully; otherwise, we report an error to the caller.

If this is all we need, then there's no need for libltdl, I think.

>   struct emacs_runtime {
>     size_t size;
>     struct emacs_env (*get_environment)(struct emacs_runtime* ert);
>   };
> 
> The `size' member tells modules how long the emacs_runtime structure
> is. (It's better to use size than an explicit version field: this way,
> .size = sizeof(struct emacs_runtime) is always correct.)

This approach requires us to change the size each time we change the
layout, even if the change itself leaves the size intact.  Using a
version field doesn't have this disadvantage.

> Thread-local environments
> -------------------------
> 
> The `get_environment' member lets us do anything else interesting. As
> in Java, environments are thread-local. We only support one thread for
> the moment, so this constraint is easy to enforce. (Just abort if we
> use an emacs_env off the main thread.)

Is it wise to design for threads at this point?  The only suggestion
on the table for "threads" is a far cry from real threads, and the
current Lisp interpreter is otherwise thread-unsafe.  Won't the thread
infrastructure add a lot of cruft that will go unneeded for the
observable future?

> We'll represent all Lisp values as an opaque pointer typedef
> emacs_value.  Each emacs_value is either a local or a global
> reference.  Local references are valid only on the current thread and
> only while the module function Emacs called is on the stack --- think
> GCPRO.  Global references are valid indefinitely: each one is a GC
> root.

With massive use of calls to Lisp functions from modules (since we
don't provide them with direct access in C to internals of many
objects), how can we ensure GC doesn't strike while the function that
created an emacs_value is still on the callstack?  You say "we don't
lock ourselves into conservative stack-scanning GC", which I interpret
as saying you don't want to rely on stack scanning to avoid a
destructive GC in this case.  But if we don't rely on that, where's
the guarantee that such emacs_value will survive GC?

> We'll represent all Lisp values as an opaque pointer typedef
> emacs_value.

This doesn't play well with --with-wide-int, where a value can be
wider than a pointer.  I think we should instead go with intmax_t or
inptr_t, whichever is wider on the host.

> Function registration
> ---------------------
> 
>   typedef emacs_value (*emacs_subr)(
>     emacs_env* env,
>     int nargs,
>     emacs_value args[]);
> 
>     emacs_value (*make_function)(
>       emacs_env* env,
>       int min_arity,
>       int max_arity,
>       emacs_subr function);

What about the doc string?

>     emacs_value (*funcall)(
>       emacs_env* env,
>       emacs_value function,
>       int nargs,
>       emacs_value args[]);

Shouldn't funcall use emacs_subr?

> Modules can register functions in the global namespace by calling a
> Lisp-level function

This is unclear, can you elaborate?  What happens if a function is not
"registered"? what's its status then?

> When Lisp calls a module-defined function object, Emacs calls the
> emacs_subr callback with which the function was defined.

This is a change in the Lisp interpreter, I think.  Why do we need
this?

> If Lisp signals or throws, `funcall' returns NULL.

I suggest some other value or indication of that.  NULL is a valid
return value, so usurping it for errors might be too harsh.

Or maybe I don't understand how will Lisp functions return values to
the module, under your suggestion.  Can you describe that?

> `intern' also does the obvious thing.

Do we need 'unintern' as well?

>     emacs_value (*type_of)(
>       emacs_env* env,
>       emacs_value value);
> 
> Like Lisp type-of: returns a symbol.

What is a "symbol", from the module's C code POV?  You show no
functions to access attributes of symbols, so it must be either one of
the other types, like an integer or a string, or a C primitive data
type, like a char * pointer.

>     int64_t (*fixnum_to_int)(
>       emacs_env* env,
>       emacs_value value);
> 
>     emacs_value (*make_fixnum)(
>       emacs_env* env,
>       int64_t value);
> 
> These functions do the obvious thing.  They signal error on type
> mismatch.  We use int64_t to handle big-integer Emacs variants on
> 32-bit platforms.

The last bit means we will need a utility function to return the valid
range of integers, so that modules can be written to support 32-bit
and 64-bit use cases without raising errors.

>     bool (*copy_string_contents)(
>       emacs_env* env,
>       emacs_value value,
>       char* buffer,
>       size_* length_inout);
> 
>     emacs_value (*make_string)(
>       emacs_env* env,
>       const char* contents);
> 
> These functions let C code access Lisp strings.  I imagine we'll
> always produce and consume UTF-8.

Strings in Emacs are of limited usability if you cannot encode and
decode them.  So this needs to be part of supported functionality, I
think.

More generally, modules that would like to process buffer or string
text will have to be able to deal with Emacs's internal encoding of
text, which means macros and functions we use in the core.  The
alternative of working only on UTF-8 encoded replica means we'd need
to encode and decode text across the module boundaries -- that's a lot
of consing.

> `copy_string_contents' copies into a caller-allocated buffer instead
> of returning a char* callers must free() --- this way, modules and the
> Emacs core don't need to share the same C runtime.  We can deal with
> the buffer-length issue in a number of ways: here, we just accept the
> destination buffer size in *length_inout and write the total length of
> the string to *length_inout on normal return.  We just truncate if
> we're given too short a buffer and don't signal an error; this way,
> callers can loop around and allocate a sufficiently large buffer for a
> string's contents.

That's an annoyance, IMO; why not provide a function to return the
required size?  Its implementation is trivial.

> I think the interface above is enough for complete functionality in a
> module, but for efficiency, we might want to expose additional
> facilities, like access to a unibyte buffer's raw representation.

I can envision a few additional missing bits:

 . direct access to buffer text (using buffer-substring means consing
   a lot of strings)
 . creation of opaque objects that should be returned to Lisp (like
   handles to objects managed by modules)
 . not sure how will a module "provide" its feature
 . some operations that must be efficient because they are typically
   done in inner loops, like regexp matching and accessing syntax of
   characters: doing that via Lisp will probably be horribly slow

> Convenience library
> ---------------

One thing that's inconvenient is the need to drag the environment
pointer through all the calls.  Why exactly is that needed?

> bool
> emacs_find_file(emacs_env* env, const char* filename)
> {
>   emacs_value e_filename = env->make_string(env, filename);
>   if(env->error_check(env)) return false;
>   emacs_value e_find_file = env->intern(env, "find-file");
>   if(env->error_check(env)) return false;
>   return env->funcall(env, e_find_file, &e_filename, 1) != NULL;
> }

This kind of code looks tedious to me, no matter if it's in Emacs or
in the module.  Just an observation.

Also, the buffer returned by find-file when it returns normally is
lost here, isn't it?

Thanks.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Dynamic loading progress, (continued)
Prev by Date: Current master conflicts with ECB package
Next by Date: Re: Dynamic loading progress
Previous by thread: Re: Dynamic loading progress
Next by thread: Re: Dynamic loading progress
Index(es):
- Date
- Thread