Re: Larger GC thresholds for non-interactive Emacs

On Fri, Jun 24, 2022, 2:54 AM Eli Zaretskii <eliz@gnu.org> wrote:

> From: Lynn Winebarger <owinebar@gmail.com>
>
> > > It would help if there were some taxonomy of features/design
> > > points for
> > > reference. Is the bug database being used for that?
> >
> > I don't think so (IIUC what you mean by that), and I don't really see
> > how bugs could serve in that role.
>
> I agree, but some of the bug reports are classified as something along
> the lines of "feature request" or "wishlist".

If you look at those, you will see that almost all of them request
minor features that have no impact on the design whatsoever. It is
very rare to have a feature request on the bug tracker that changes
the design or introduces new designs. I would even dare to say it
never happened, definitely not on my watch.

> I haven't found any other database tracking anything corresponding
> to features, proposed or otherwise.

We do have etc/TODO. Did you look there?

Yes, but it's not indexed so there's no good way to reference particular items the way bugs are.

> Once something is in place, no matter how bare-bones, the key would be
> maintainers mandating that any substantive introduction or revision of
> features be tied to appropriate entries for the corresponding
> design/spec documentation, in the same way and for the same reason
> coding standards are enforced.

I think you are greatly underestimating the efforts required to
maintain documentation of this kind, let alone the effort for creating
it basically from scratch.

I just have a lot of sympathy for the points you made in that portable dumper thread, regarding the cost of accepting substantial contributions from transitory contributors.

Anyone can create a new feature or improvement and maintain their own variant, but it's a lot easier for them if the project takes over maintenance and breaking it becomes a rejection test for future changes.

Everything you say below is sensible. You and the other (current and future) core maintainers are the ones who have to make the trade-off, after all. I'm going to add some comments below, though, because I think my perspective on what is useful is different. If they are obvious or too idiosyncratic to be useful in the context of attracting new contributors/maintainers, feel free to ignore them.

Programmers usually don't like writing documentation, and most of them
cannot write good documentation even if they did want, in large part
because that take years of practice actually doing it. In a project
like Emacs, it is nigh impossible to force contributors and developers
do what they are reluctant to do in the first place. Without everyone
updating the documentation as they change code, any documentation
about design and internals will quickly become outdated, and thus not
just useless, but downright harmful. We have trouble keeping even our
user documentation up-to-date.

Is it harmful? I suppose it depends on who's relying on it. Diving into the source of a project with Emacs's complexity with a particular design change in mind, personally I prefer to have some guideposts knowing that they may be dated, then ask knowledgeable people if there seems to be a disagreement between the doc and the code. When that happens on a public forum like this list, then there's some reference.

It's true, I have a low bar. It might not seem like it, but my preference is to explore a question as much as I can on my own before bothering the experts. In the best case, the question that stumps me also stumps the expert - then I know I've done adequate study. With Emacs (and often other free software projects), it's just hard to tell what I should be responsible for studying prior to asking questions. Beyond the code itself, of course.

Are you aware of _any_ Free Software project that has any useful

documentation of this kind?

I know some "Open Source" projects (eg chromium) that have useful docs. Also projects with academic roots (Larceny Scheme is still my go-to) tend to have them. I'm assuming that in this context, the OSS projects, at least, are not "Free Software project[s]" in the sense you intended, even if the software produced by those projects happens to be Free.

I don't do development for a living, so when I first looked into the code of those corporate projects, the general culture around demanding uniformity of build systems and supported configurations was quite a culture shock after being accustomed to the approach of Free Software projects. So I'm assuming you intend to refer to projects that have freedom as an organizing principle, and hence exhibit a self-selection bias toward diverse and idiosyncratic preferences and goals among their contributors. Or something along those lines?

Every project I ever knew or was involved
with which tried ended up abandoning that. A case in point is GDB,
which once had a "GDB Internals" manual -- it was always outdated, and
was eventually scrapped because the maintainers decided they could not
and didn't want to invest the effort.

I remember thinking it was a shame. But, my bar for usefulness is fairly low.

XEmacs tried in its time to
have the internals documented, but that was basically a one-man
effort, and even in it whole chapters were never written. Etc. etc.

I think there's some conclusion waiting here.

I think we have different problem scopes in mind. I'd take some kind of indexed wiki as a substantial improvement, particularly if there were links between tags in the code as well as files and directories cross-referencing features and VC logs, with the primary archive of discussion happening there rather than an email list (as with discussion around bugs). I'd just like to be able to determine answers to questions like: why is preventing the allocation of lisp objects by mmap necessary? What purpose does ralloc.c serve? Do they still serve a purpose?, and know whether I've done enough self-study to justify asking the experts.

> That's assuming the maintainers consider such documentation
> important enough for enabling future potential contributors and
> maintainers to hold otherwise useful contributions in limbo.

And you think we don't? Of course we do! We also want to release
Emacs much more frequently, and we want it to have no bugs, and a few
other things. But somehow, each such goal cannot be reached for some
boring practical reasons, none of them related to their importance in
our eyes.

I did not mean to imply you don't consider them important (I can see it might read that way). I meant "important enough [relative to all the competing priorities]". I don't know of any other way of expressing importance except relative to competing priorities subject to resource constraints.

We do in general consider it somewhat more important to develop Emacs
and accept contributions than to document its internals, that's true.
Which is why I said up-thread that without someone who'd volunteer to
do this job (thus dedicating his/her time almost exclusively to it),
this will probably never happen, given the current state of our
resources, which are barely enough to maintain the status quo and move
forward at some reasonable pace.

> > Most of the design changes and
> > redesigns in Emacs were developed without any bug report, simply
> > because those who did the job knew that a particular group of problems
> > needs to be taken care of.
>
> It's not like there isn't any discussion or justification of features
> offered prior to code being integrated into the main branch. It's more
> a challenge of how to weave what's there into a coherent account of
> what's going on in the code.

IME, you are wrong in that assumption. The significant changes in
Emacs design are almost never publicly discussed, not in the way that
would allow someone to glean the design from them.

Then, I should understand Daniel Colascione's approach with the portable dumper preview thread and the more recent one on a revised gc are either outliers - or that I'm misreading their scope relative to the discussion of design issues they provide? [ That's a straight question to resolve my ignorance, not intended to be snarky. ]

> It would just be easier to automate some sort of design note
> extraction from the git log if references to mails could be associated
> with relevant features. I've never used org, but maybe there's some
> syntax that would be useful? Or maybe some notation from supercite
> for precise pulling of relevant text from list archives?

I wish this were true. It isn't. The discussions and the commit log
messages rarely describe the design, and in many cases barely describe
even the particular implementation. In a project where people with
write access can commit changes without any review, I don't know how
can anything else to be expected. We basically rely on each
individual to do the job perfectly and contribute to what you want to
see documented. The results are before our eyes, and they shouldn't
surprise anyone.

True enough.

> > > https://github.com/rocky/elisp-bytecode
> > > That is really useful documentation that would ideally be in the emacs docs
> > > or etc directory.
> >
> > That's not design description, though.
>
> You probably have a more nuanced view than me on this. It's true,
> that document is focused on the specification (the "what") rather than
> the (detailed) "how" and "why" - is that what you mean?

Of course.

> Either way, if you want to understand how the operational semantics
> of emacs lisp work in practice, a document of that sort is
> invaluable. Without that, a document explaining the "why" isn't
> going to be able to be very concrete.

I agree, but the "what" is usually already available in the comments
to the code, though not everywhere and for every significant feature.
The "why" and "how", by contrast are almost completely missing.

In the particular case of the bytecode spec, at least it gives me the sense of the invariants that are being maintained by the implementation. Trying to reverse engineer what those invariants are from a giant C switch statement is always tricky, because they tend to be expressed with boilerplate code, and then sometimes you have clever use of the fact that cases are non-exclusive and "fall through" without an explicit break, etc. That's where it intersects with "design" for me.

OTOH, the same switch statement could be written as a dispatch table of higher-order functions exploiting proper tail-recursion in a way I would feel comfortable enough with to not feel the need to lean on a spec. But that assumes those higher-order functions clearly express the invariants they impose. It's a personal preference, I know.

To summarize: I'm not sure we should continue this discussion, because
I don't see where is it going and what could it possibly change in
practice. I agree with the value of having all of that documented,
I'm just saying it's a large job that needs dedicated individuals, and
I don't see how that could be replaced by any semi-automated means.

I was surprised my off-the-cuff remark about trawling the archives generated any response in the first place, to be honest.

One thing (I think) some of those OSS IDE projects do well is eat their own dogfood in terms of features for projects/documentation/collaborative development models generally. AFAICT, Emacs has support (either in core or in non-core packages) for a lot of different tools/approaches/etc to these issues, but doesn't seem to take advantage of them for emacs development itself, beyond the integrated bug reporting and version control systems. There's support for all these project formats and tags - why isn't there a standard choice of those imposed on the emacs projects itself, e.g. with a centralized repo of pre-generated tags and auto-generated extraction of documentation/data structure definitions? Maybe I just haven't read that section of the manual or the right document in "etc" yet - I freely admit my ignorance. As it is, I've just assumed that the core maintainer(s) have weighed the trade-offs and decided imposing that kind of constraint, even among the core maintainers, would cost more than it is worth (in attracting contributors and maintainers).

If there was an "emacs-devel" mode where the tags (and files would be tags considered as entries in a directory "document") linked to a database cross-referenced with the feature database, the VC logs, and a wiki-ish interface that could record both documentation and "talk". Sort of like "help" mode, but with a wikipedia-ish feel. The contributors to that piece of code and maintainers covering the related features would be auto-subscribed to the entry, and any discussion would at least get recorded in a way linked to the thing being discussed. I'm not sure contributors would want to be auto-subscribed like that, but then maybe their contribution should be reviewed by someone who would be.

It's just an idea - whether it's useful or practical for the purpose of facilitating the recruitment of new contributors/maintainers, given that some culture change would likely be required, is not something I have the perspective to judge. I do know I would find something like what I've described very useful.

Lynn

From:	Lynn Winebarger
Subject:	Re: Larger GC thresholds for non-interactive Emacs
Date:	Sat, 25 Jun 2022 12:50:41 -0400