[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] add language/wisp to Guile?

From: Philip McGrath
Subject: Re: [PATCH] add language/wisp to Guile?
Date: Sun, 26 Feb 2023 02:45:12 -0500
User-agent: Cyrus-JMAP/3.9.0-alpha0-172-g9a2dae1853-fm-20230213.001-g9a2dae18


On Sat, Feb 18, 2023, at 10:58 AM, Maxime Devos wrote:
> On 18-02-2023 04:50, Philip McGrath wrote:
>> I haven't read the patch or this thread closely,
> I'll assume you have read it non-closely.
>> but R6RS has an answer to any concerns about compatibility with `#lang`. At 
>> the beginning of Chapter 4, "Lexical and Datum Syntax" 
>> (<>) the 
>> report specifies:
>>>   An implementation must not extend the lexical or datum syntax in any way, 
>>> with one exception: it need not treat the syntax `#!<identifier>`, for any 
>>> <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, 
>>> and it may use specific `#!`-prefixed identifiers as flags indicating that 
>>> subsequent input contains extensions to the standard lexical or datum 
>>> syntax. The syntax `#!r6rs` may be used to signify that the input afterward 
>>> is written with the lexical syntax and datum syntax described by this 
>>> report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3.
> That is for '#!lang', not '#lang'.  R6RS allows the former, but the 
> patch does the latter.  As such, R6RS does not have an answer about 
> incompatibility with `#lang', unless you count ‘it's incompatible’ as an 
> answer.

Let me try to be more concrete.

If you want a portable, RnRS-standardized lexical syntax for `#lang`, use 
`#!<identifier>`, and systems that understand `#lang` will treat it (in 
appropriate contexts) as an alias for `#lang `.

Alternatively, you could embrace that Guile (like every other Scheme system I'm 
aware of) starts by default in a mode with implementation-specific extensions. 
Indeed, R6RS Appendix A specifically recognizes that "the default mode offered 
by a Scheme implementation may be non-conformant, and such a Scheme 
implementation may require special settings or declarations to enter the 
report-conformant mode" [1]. Then you could just write `#lang` and worry about 
the non-portable block comments some other day. This is what I would personally 

>> In Racket, in the initial configuration of the reader when reading a file, 
>> "`#!` is an alias for `#lang` followed by a space when `#!` is followed by 
>> alphanumeric ASCII, `+`, `-`, or `_`." (See 
>> <>.)
>>  [...] > (Guile does not handle `#!r6rs` properly, presumably because of the 
> legacy `#!`/`!#` block comments. I think this should be a surmountable 
> obstacle, though, especially since Guile does support standard `#|`/`|#` 
> block comments.)
> ‘#! ... !#’ comments aren't legacy; they exist to allow putting the 
> shebang in the first line of a script, and to pass additional arguments 
> to the Guile interpreter (see: (guile)The Top of a Script File) (*).  As 
> such, you can't just replace them with #| ... |# (unless you patch the 
> kernel to recognise "#| ..." as a shebang line).
> (*) Maybe they exist for other purposes too.

According to "(guile)Block Comments", the `#!...!#` syntax existed before Guile 
2.0 added support for `#|...|#` comments from SRFI 30 and R6RS.

> Furthermore, according to the kernel, #!r6rs would mean that the script 
> needs to be interpreted by a program named 'r6rs', but 'guile' is named 
> 'guile', not 'r6rs'.  (I assume this is in POSIX somewhere, though I 
> couldn't find it.)
> (This is an incompatibility between R6RS and any system that has shebangs.)

This is not an incompatibility, because the `#!r6rs` lexeme (or 
`#!<identifier>`, more generally) is not the shebang line for the script. R6RS 
Appendix D [2] gives this example of a Scheme script:

#!/usr/bin/env scheme-script
(import (rnrs base)
        (rnrs io ports)
        (rnrs programs))
(put-bytes (standard-output-port)
                 (cadr (command-line)))

The appendix says that, "if the first line of a script begins with `#!/` or 
`#!<space>`, implementations should ignore it on all platforms, even if it does 
not conform to the recommended syntax". Admittedly this is not handled as 
consistently as I would prefer: I wish they had just standardized `#!/` and `#! 
` as special comment syntax, as Racket does, and clarified the interaction with 
`#!<identifier>`. But Matt points out that JavaScript also has very similar 
special treatment for a single initial shebang comment. Lua has a similar 
mechanism: my vague recollection is that many languages do. 

>>> (^) it doesn't integrate with the module system -- more concretely,
>>> (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments
>>> would solve that, but we agree that that would be unreasonable in many
>>> situations.  (Alternatively one could place ECMAScript code in a file
>>> with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but
>>> ... no.)

Generally I would use `.scm` (or `.rkt`), and certainly I would do so if there 
isn't some well-established other extension. If you are just using the file, 
you shouldn't necessarily have to care what language it's implemented in 

In particular, I don't think the `#lang` concept should be conflated with 
editor configuration like `'-*- mode: ecmascript -*-`. As an example, consider 
these two Racket programs:

parent(anchises, aeneas).
parent(aeneas, ascanius).
ancestor(A, B) :- parent(A, B).
ancestor(A, B) :- parent(A, C), ancestor(C, B).
ancestor(A, ascanius)?

#lang algol60
    comment Credit to Rosetta Code;
    integer procedure fibonacci(n); value n; integer n;
        integer i, fn, fn1, fn2;
        fn2 := 1;
        fn1 := 0;
        fn  := 0;
        for i := 1 step 1 until n do begin
            fn  := fn1 + fn2;
            fn2 := fn1;
            fn1 := fn
        fibonacci := fn
    integer i;
    for i := 0 step 1 until 20 do printnln(fibonacci(i))

While I'm sure there are Emacs modes available for Datalog and Algol 60, and 
some people might want to use them for these programs, I would probably want to 
edit them both in racket-mode: because racket-mode supports the `#lang` 
protocol, it can obtain the syntax highlighting, indentation, and other support 
defined by each language, while also retaining the global features that all 
`#lang`-based languages get "for free", like a tool to rename variables that 
respects the actual model of scope. This is one of the value propositions of 
the `#lang` system.

>> Racket has a mechanism to enable additional source file extensions without 
>> needing explicit command-line arguments by defining `module-suffixes` or 
>> `doc-modules-suffixes` in a metadata module that is consulted when the 
>> collection is "set up": 
>> However, this mechanism is not widely used.
> I guess this is an improvement over the runtime 'guile -x extension'.
> However, if I'm understanding 'setup-info.html' correctly, the downside 
> is that you now need a separate file containing compilation settings.
> I have previously proposed a mechanism that makes the '-x' + 
> '--language' a compile-time thing (i.e., embed the source file extension 
> in the compiled .go; see previous e-mails in this thread), without 
> having to make a separate file containing compilation settings.
> How is Racket's method an improvement over my proposal?

My focus in this thread is explaining and advocating for `#lang`. I see the 
whole business with file extensions as basically orthogonal to `#lang`, and my 
opinions about it are much less strong, but I'll try to answer your question. I 
think it would make sense for `.go` files to record the file extension of their 
corresponding source files: Racket's `.zo` files do likewise. I don't object to 
a command-line option *at compile-time* (as you said) to enable additional file 
extensions, and I agree that there isn't a huge difference between that and an 
approach with a separate configuration file, though I do find the 
configuration-file approach somewhat more declarative, which I prefer.

What I was really trying to argue here is that the file extension should not 
determine the meaning of the program it contains: more on that below.

>> Overall, the experience of the Racket community strongly suggests that a 
>> file should say what language it is written in. Furthermore, that language 
>> is a property of the code, not of its runtime environment, so environment 
>> variables, command-line options, and similar extralinguistic mechanism are a 
>> particularly poor fit for controlling it.
> Agreed on the 'no environment variables' thing, disagreed on the 'no 
> command-line options'.  In the past e-mails in this thread, there was 
> agreement on the ‘embed the source file extension in the compiled .go or 
> something like that; and add -x extension stuff _when compiling_ (not 
> runtime!) the software that uses the extension’.
> Do you any particular issues with that proposal?  AFAICT, it solves 
> everything and is somewhat more straightforward that Racket.

I don't have particular issues with a compile-time command-line option to 
determine which files to compile. I do object to using command-line options or 
file extensions to determine what language a file is written in. 

>> File extensions are not the worst possible mechanisms, but they have similar 
>> problems: code written in an unsaved editor or a blog post may not have a 
>> file extension.
> With the proposal I wrote, it remains possible to override any 'file 
> extension -> language' mapping.  It's not in any way incompatible with 
> "-*- lang: whatever -*-"-like comments.
> Additionally, Guile can only load files that exist (i.e, 'saved'); Guile 
> is not an editor or blog reader, so these do not appear problems for 
> Guile to me.

While it's true that the only files Guile can load are "files that exist", it's 
not true that "Guile can only load files": consider procedures like 
`eval-string`, `compile`, and, ultimately, `read-syntax`.

AFAICT, to the extent that Guile's current implementations of such procedures 
support multiple languages, they rely on out-of-band configuration, like an 
optional `#:language` argument, which is just as extra-linguistic as relying on 
command-line options, environment variables, or file extensions. What I'm 
trying to advocate is that programs should say in-band, as part of their source 
code, what language they are written in.

> If the editor needs to determine the language for syntax highlighting or 
> such, then there exist constructs like ';; -*- mode: scheme -*-' that 
> are valid Scheme, but that's not a Guile matter.

See above for why the `#!language/wisp` option is perfectly valid R6RS Scheme 
and for some of my concerns about overloading editor configuration to determine 
the semantics of programs.

More broadly, everyone who reads a piece of source code, including humans as 
well as editors and the `guile` executable, needs to know what language it's 
written in to hope to understand it.

>> (For more on this theme, see the corresponding point of the Racket 
>> Manifesto: 
>> <>)
>>  Actually writing the language into the source code has proven to work well.
> What is the corresponding point?  I'm not finding any search results for 
> 'file extension' or 'file name', and I'm not finding any relevant search 
> results for 'editor'.  Could you give me a page reference and a relevant 
> quote?

I was trying to refer to section 5, "Racket Internalizes Extra-Linguistic 
Mechanisms", which begins on p. 121 (p. 9 of the PDF). Admittedly, the 
connection between the main set of examples they discuss and this conversation 
is non-obvious. Maybe the most relevant quote is the last paragraph of that 
section, on p. 123 (PDF p. 11): "Finally, Racket also internalizes other 
aspects of its context. Dating back to the beginning, Racket programs can 
programmatically link modules and classes. In conventional languages, 
programmers must resort to extra-linguistic tools to abstract over such 
linguistic constructs; only ML-style languages and some scripting languages 
make modules and classes programmable, too." (Internal citations omitted.)

>> To end with an argument from authority, this is from Andy Wingo's "lessons 
>> learned from guile, the ancient & spry" 
>> (<>):

Sorry, this was meant to be tongue-in-cheek, and it seems that didn't come 
across. "Argument from authority" is often considered a category of logical 
fallacy, and ending with a quote is sometimes considered to be bad style or to 
weaken a piece of persuasive writing.

>    * I previously pointed out some problems with that proposal
>      -- i.e., '#lang whatever' is bogus Scheme / Wisp / ...,

I hope I've explained why something like `#!language/wisp` is perfectly within 
the bounds of R6RS.

Also, given that Guile already starts with non-standard extensions enabled by 
default, I don't see any reason not to also support `#lang language/wisp`. In 
particular, the spelling of `#lang` proceeds directly from the Scheme 
tradition. This is from the R6RS Rationale document, chapter 4, "Lexical 
Syntax", section 3, "Future Extensions" [3]:

>>>> The `#` is the prefix of several different kinds of syntactic entities: 
>>>> vectors, bytevectors, syntactic abbreviations related to syntax 
>>>> construction, nested comments, characters, `#!r6rs`, and 
>>>> implementation-specific extensions to the syntax that start with `#!`. In 
>>>> each case, the character following the `#` specifies what kind of 
>>>> syntactic datum follows. In the case of bytevectors, the syntax 
>>>> anticipates several different kinds of homogeneous vectors, even though 
>>>> R6RS specifies only one. The `u8` after the `#v` identifies the components 
>>>> of the vector as unsigned 8-bit entities or octets. 

>      and
>      'the module system won't find it, because of the unexpected
>      file extensions'.

This is indeed something that needs to be addressed, but it seems like a very 
solvable problem. Using the extension ".scm" for everything would be one 
trivial solution. Something like your proposal to enable file extensions based 
on a compile-time option could likewise be part of a solution.

In general, I'll say that, while using Guile, I've often missed Racket's more 
flexible constructs for importing modules. I especially miss `(require 
"foo/bar.rkt")`, which imports a module at a path relative to the module where 
the `require` form appears: it makes it easy to organize small programs into 
multiple files without having to mess with a load path.

More messages have come since I started writing this reply, so I'll try to 
address them, too.

On Thu, Feb 23, 2023, at 1:04 PM, Maxime Devos wrote:
> On 23-02-2023 09:51, Dr. Arne Babenhauserheide wrote:
>>> Thinking a bit more about it, it should be possible to special-case
>>> Guile's interpretation of "#!" such that "#!r6rs" doesn't require a
>>> closing "!#".  (Technically backwards-incompatible, but I don't think
>>> people are writing #!r6rs ...!# in the wild.)
>> Do you need the closing !# if you restrict yourself to the first line?
> I thought so at first, but doing a little experiment, it appears you 
> don't need to:
> $ guile
> scheme@(guile-user)> #!r6rs
> (display "hi") (newline)
> (output: hi)
> Apparently Guile already has required behaviour.

All the `#!r6rs` examples I've tried since I got Ludo’'s mail have worked, but 
I remember some not working as I'd expected in the past. I'll see if I can come 
up with any problematic examples again.

On Thu, Feb 23, 2023, at 1:42 PM, Maxime Devos wrote:
> Have you seen my messages on how the "#lang" construct is problematic 
> for some languages, and how alternatives like "[comment delimiter] -*- 
> stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be 
> equally simple (*) and not have any downsides (**).
> (*) The port encoding detection supports "-*- coding: whatever -*-", 
> presumably that functionality could be reused.

IMO, the use of  "-*- coding: whatever -*-" to detect encoding is an ugly hack 
and should not be extended further.

I tried to raise some objections above to conflating editor configuration with 
syntax saying what a file's language is.

More broadly, I find "magic comments" highly objectionable. The whole point of 
comments is to be able to communicate freely to human readers without affecting 
the interpreter/compiler/evaluator. Introducing magic comments means must 
constantly think about whether what you are writing for humans might change the 
meaning of your program. Magic comments *without knowing a priori what is a 
comment* are even worse: now, you have to beware of accidental "magic" in ALL 
of the lexical syntax of your program. (Consider that something like `(define 
(-*- mode: c++ -*-) 14)` is perfectly good Scheme.)

(It's not really relevant for the `#lang`-like case, but something I find 
especially ironic about encoding "magic comments" or, say, `<?xml version="1.0" 
encoding="UTF-8"?>`, is that suddenly if you encode the Unicode text in some 
other encoding it becomes a lie.)

On Fri, Feb 24, 2023, at 6:51 PM, Maxime Devos wrote:
> On 25-02-2023 00:48, Maxime Devos wrote:
>>>> (**) For compatibility with Racket, it's not like we couldn't
>>>> implement both "#lang" and "-*- stuff: language -*-".
> TBC, I mean ‘only support #lang' for values of 'lang' that Racket 
> supports’

If I understand what you're proposing here, I don't think it's a viable option.

The fundamental purpose of the `#lang` construct (however you spell it) is to 
provide an open, extensible protocol for defining languages. Thus, "values of 
'lang' that Racket supports" are unbounded, provided that a module has been 
installed where the language specification says to look. From The Racket 
Reference [4]:

>>>> The `#lang` reader form is similar to `#reader`, but more constrained: the 
>>>> `#lang` must be followed by a single space (ASCII 32), and then a 
>>>> non-empty sequence of alphanumeric ASCII, `+`, `-`, `_`, and/or `/` 
>>>> characters terminated by whitespace or an end-of-file. The sequence must 
>>>> not start or end with `/`. A sequence `#lang ‹name›` is equivalent to 
>>>> either `#reader (submod ‹name› reader)` or `#reader ‹name›/lang/reader`, 
>>>> where the former is tried first guarded by a `module-declared?` check (but 
>>>> after filtering by `current-reader-guard`, so both are passed to the value 
>>>> of `current-reader-guard` if the latter is used). Note that the 
>>>> terminating whitespace (if any) is not consumed before the external 
>>>> reading procedure is called.
>>>> Finally, `#!` is an alias for `#lang` followed by a space when `#!` is 
>>>> followed by alphanumeric ASCII, `+`, `-`, or `_`. Use of this alias is 
>>>> discouraged except as needed to construct programs that conform to certain 
>>>> grammars, such as that of R6RS [Sperber07].

(The rationale for the constraints, which Racketeers generally tend to chafe 
against, is that the syntax of `#lang‹name›` is the one and only thing that 
`#lang` doesn't give us a way to compatibly change. We can quickly get to a 
less constrained syntax by using a chaining "meta-language": see `#lang s-exp` 
and `#lang reader` on that page for two of many examples.)

I expect reading this would raise more questions, because that page gives lots 
of details on Racket's `#lang` protocol. Do I really expect Guile to implement 
all of those details? If not, in what sense is what I'm advocating actually 
compatible with `#lang`?

I am definitely **not** suggesting that Guile implement all the details of 
Racket's `#lang` implementation. What I do strongly advocate is that you design 
Guile's support for `#lang` (or `#!`) to leave open a pathway for compatibility 
in the future.

I think the best way to explain how that would work is to take as an extended 
example Zuo, the tiny Scheme-like language created last year to replace the 
build scripts for Racket and Racket's branch of Chez Scheme. Zuo was initially 
prototyped in Racket as a `#lang` language. Since the goal was to use Zuo to 
build Racket, the primary implementation is an interpreter implemented in a 
single file of C code, avoiding bootstrapping issues. There isn't a working Zuo 
implementation as a Racket at the moment. (There's a shim implementation, and 
there's some work in progress, as people have time and interest, to get a real 
implementation working again.) 

Zuo is based on `#lang`, but its protocol [5][6] is quite different than 
Racket's. Nevertheless, as I will explain, they are compatible.

The C code in fact implements not `#lang zuo` or even `#lang zuo/base` but 
`#lang zuo/kernel`: the rest of `#lang zuo` is implemented in Zuo, building up 
to `#lang zuo` through a series of internal languages. A module written in 
`#lang zuo/kernel` is a single expression which produces an immutable 
symbol-keyed hash table, which is Zup's core representation of a module. When 
Zuo encounters `#lang whatever`, it looks up the symbol `'read-and-eval` in the 
hash table representing the module `whatever`: the result should be a procedure 
that, given a Zuo string (a Scheme bytevector) with the source of the module, 
returns a hash table to be used as the module's representation.

An implementation of `#lang zuo/kernel` in Racket would bridge this protocol 
with Racket's `#lang` by synthesizing `reader` submodules implementing the 
procedures the Racket protocol expects by wrapping the procedure mapped to 
`'read-and-eval` in the Zuo-level hash table. The wrappers would propagate 
themselves, so a language implemented in a language implemented in `#lang 
zuo/kernel` would likewise be automatically bridged, and so on ad infinitum. 
Racket's submodules [7] make this work especially elegantly.

In Guile, my experience with the tower of languages is limited, but AIUI many 
of the existing facilities are like `lookup-language`[8] in expecting language 
X to be implemented by a language object bound to X in the module `(language X 
spec)`. I'd suggest that Guile support `#lang language/X` (or `#!language/X`, 
if you prefer to spell it that way) by likewise looking up X in the `(language 
X spec)` module. One day, compatibility could be achieved by adding trivial 
bridge (sub)modules: for an illustration of how trivial this can be, see [8], a 
one-line module that makes SRFI 11 available as `(import (srfi :11))` for R6RS 
by wrapping its historical PLT Scheme location, `(require srfi/11)`.

I would NOT suggest supporting arbitrary things after `#lang`, because one part 
of planning for compatibility is avoiding future namespace collisions. Happily, 
`language/` is not otherwise in use in the Racket world, so I suggest that 
Guile claim it. I don't think this should be overly restrictive: if it seems 
worth-while to support languages from other modules, you could implement the 
"chaining meta-language" approach I mentioned above: imagine something like 
`#!language/other (@ (some other module) exported-language)`, where the `other` 
export of `(language other spec)` is responsible for reading the next datum and 
using it to obtain the language object to be used for the rest of the module.

(Other kinds of potential namespace collisions are easier to manage: for 
example, we could imagine that `(use-modules (foo bar baz))` might not access 
the same module as `(require foo/bar/baz)`. This is in a way an example of 
where it makes sense to be constrained in the syntax of `#lang` itself and let 
`#lang` unlock endless possibilities.)

I've sort of alluded above to my pipe dream of a grand unified future for 
Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and endless other 
possibilities. I wrote about it in more detail on the guix-devel list at [10]. 
(These thoughts were inspired by conversations with Christine Lemmer-Webber, 
though she bears no responsibility for my zany imaginings.)

Finally, I looked into the history of `#!` in R6RS a bit, and I'll leave a few 
pointers here for posterity. Will Clinger's 2015 Scheme Workshop paper [11] 
says in section 3.1 that "Kent Dybvig suggested the `#!r6rs` flag in May 2006", 
Clinger "formally proposed addition of Dybvig’s suggestion" [12], and, "less 
than six weeks later," `#!r6rs` was "in the R6RS editors’ status report". (I am 
not persuaded by all of the arguments about `#!r6rs` in that paper: in 
particular, the analysis doesn't seem to account for R6RS Appendix A [1].) As 
best as I can tell, the suggestion from Kent Dybvig is [13]:

On Wed May 10 15:40:13 EDT 2006, Kent Dybvig wrote:
> We already have (as of last week's meeting) a syntax for dealing with
> implementation-dependent lexical exceptions, which is to allow for
> #!<symbol-like-thing>, e.g.:
>  #!mzsceheme
>  #!larceny
>  ...
> Perhaps we can plan on using the same tool for future extensions to the
> syntax:
>  #!r7rs
> We can even require #!r6rs to appear at the top of a library now, or at
> least allow it to be included.
> This is a lot more concise than a MIME content-type line.
> Kent

I haven't tracked down any older writing about `#!<symbol-like-thing>` for 
"implementation-dependent lexical exceptions": it may have been a conference 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]