guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] add language/wisp to Guile?


From: Maxime Devos
Subject: Re: [PATCH] add language/wisp to Guile?
Date: Sun, 26 Feb 2023 16:42:48 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.2



Op 26-02-2023 om 08:45 schreef Philip McGrath:
Hi,

On Sat, Feb 18, 2023, at 10:58 AM, Maxime Devos wrote:
On 18-02-2023 04:50, Philip McGrath wrote:
I haven't read the patch or this thread closely,

I'll assume you have read it non-closely.

but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of 
Chapter 4, "Lexical and Datum Syntax" 
(<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies:

   An implementation must not extend the lexical or datum syntax in any way, with one 
exception: it need not treat the syntax `#!<identifier>`, for any <identifier> 
(see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific 
`#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to 
the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the 
input afterward is written with the lexical syntax and datum syntax described by this 
report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3.

That is for '#!lang', not '#lang'.  R6RS allows the former, but the
patch does the latter.  As such, R6RS does not have an answer about
incompatibility with `#lang', unless you count ‘it's incompatible’ as an
answer.


Let me try to be more concrete.

If you want a portable, RnRS-standardized lexical syntax for `#lang`, use 
`#!<identifier>`, and systems that understand `#lang` will treat it (in 
appropriate contexts) as an alias for `#lang `.

RnRS only standardises #!r6rs, not #!<identifier>. Even if RnRS standardised #!<identifier> for values of <identifier> that aren't rnrs, the RnRS only holds sway for Scheme, and one of the main points of Guile's language system is to support more than only Scheme.


Alternatively, you could embrace that Guile (like every other Scheme system I'm aware of) 
starts by default in a mode with implementation-specific extensions. Indeed, R6RS 
Appendix A specifically recognizes that "the default mode offered by a Scheme 
implementation may be non-conformant, and such a Scheme implementation may require 
special settings or declarations to enter the report-conformant mode" [1]. Then you 
could just write `#lang` and worry about the non-portable block comments some other day. 
This is what I would personally prefer.

Emphasis on 'non-conformant'. The appendix states that Scheme implementations don't need to be R6RS by default; it doesn't state that things non-conformant things are conformant with R6RS.

Remember that this part of the discussion started with:

‘The '#lang whatever' stuff makes Scheme (*) files unportable between
implementations, as '#lang scheme' is not a valid comment’.

The R6RS might permit non-R6RS implementations, but this does not make non-R6RS constructs like '#lang scheme' portable.


In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias 
for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or 
`_`." (See 
<https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > 
(Guile does not handle `#!r6rs` properly, presumably because of the
legacy `#!`/`!#` block comments. I think this should be a surmountable
obstacle, though, especially since Guile does support standard `#|`/`|#`
block comments.)

‘#! ... !#’ comments aren't legacy; they exist to allow putting the
shebang in the first line of a script, and to pass additional arguments
to the Guile interpreter (see: (guile)The Top of a Script File) (*).  As
such, you can't just replace them with #| ... |# (unless you patch the
kernel to recognise "#| ..." as a shebang line).

(*) Maybe they exist for other purposes too.

According to "(guile)Block Comments", the `#!...!#` syntax existed before Guile 
2.0 added support for `#|...|#` comments from SRFI 30 and R6RS.

I agree, and I don't follow what your point is here.


Furthermore, according to the kernel, #!r6rs would mean that the script
needs to be interpreted by a program named 'r6rs', but 'guile' is named
'guile', not 'r6rs'.  (I assume this is in POSIX somewhere, though I
couldn't find it.)

(This is an incompatibility between R6RS and any system that has shebangs.)


This is not an incompatibility, because the `#!r6rs` lexeme (or 
`#!<identifier>`, more generally) is not the shebang line for the script. R6RS 
Appendix D [2] gives this example of a Scheme script:

```
#!/usr/bin/env scheme-script
#!r6rs
(import (rnrs base)
         (rnrs io ports)
         (rnrs programs))
(put-bytes (standard-output-port)
            (call-with-port
                (open-file-input-port
                  (cadr (command-line)))
              get-bytes-all))
```

OK, didn't notice that appendix.  Only covers Scheme, though.

--
The appendix says that, "if the first line of a script begins with `#!/` or `#!<space>`, 
implementations should ignore it on all platforms, even if it does not conform to the recommended 
syntax". Admittedly this is not handled as consistently as I would prefer: I wish they had just 
standardized `#!/` and `#! ` as special comment syntax, as Racket does, and clarified the interaction 
with `#!<identifier>`. But Matt points out that JavaScript also has very similar special 
treatment for a single initial shebang comment. Lua has a similar mechanism: my vague recollection is 
that many languages do.

I do not follow what your point is here -- I only (falsely) claimed that POSIX and R6RS are incompatible w.r.t. shebangs and "#!"; I did not make such claims for other languages -- some other languages don't even have "#!" (e.g. BASIC).


(^) it doesn't integrate with the module system -- more concretely,
(use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments
would solve that, but we agree that that would be unreasonable in many
situations.  (Alternatively one could place ECMAScript code in a file
with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but
... no.)

Generally I would use `.scm` (or `.rkt`), and certainly I would do so if there 
isn't some well-established other extension. If you are just using the file, 
you shouldn't necessarily have to care what language it's implemented in 
internally.

Maybe you would, but Guile shouldn't require people to change the extension of source files to something invalid, as I pointed out with the ECMAScript example. .scm means Scheme, not ECMAScript.

As such, support for non-.scm file extensions is required.

In particular, I don't think the `#lang` concept should be conflated with 
editor configuration like `'-*- mode: ecmascript -*-`.
> [...]

Then don't do that, and use non-editor configuration like
'-*- programming-language: ecmascript -*-' instead. While Emacs is the main user of '-*- ... -*-' lines, there is nothing stopping use from adding a few variables like e.g. 'programming-language' (*) that Emacs doesn't assign a meaning to.

(*) I don't actually know if Emacs assigns a meaning to this variable or not. Some other word might perhaps be needed.

For convenience, I would recommend supporting '-*- mode: ... -*-' too, such that non-Scheme source files can sometimes be loaded without making any Guile-specific changes to the source files. If whoever writes or reads the source file wants to use another Emacs mode, or if it the mode is ambiguous because it covers multiple languages, there is nothing stopping them from setting both 'mode: ...' and 'programming-language: ...':

% -*- language: datalog; mode: racket -*-
[...]

 As an example, consider these two Racket programs:

```
#!datalog
parent(anchises, aeneas).
parent(aeneas, ascanius).
ancestor(A, B) :- parent(A, B).
ancestor(A, B) :- parent(A, C), ancestor(C, B).
ancestor(A, ascanius)?
```

```
#lang algol60
begin
     comment Credit to Rosetta Code;
     integer procedure fibonacci(n); value n; integer n;
     begin
         integer i, fn, fn1, fn2;
         fn2 := 1;
         fn1 := 0;
         fn  := 0;
         for i := 1 step 1 until n do begin
             fn  := fn1 + fn2;
             fn2 := fn1;
             fn1 := fn
         end;
         fibonacci := fn
     end;
integer i;
     for i := 0 step 1 until 20 do printnln(fibonacci(i))
end
```

While I'm sure there are Emacs modes available for Datalog and Algol 60, and some people 
might want to use them for these programs, I would probably want to edit them both in 
racket-mode: because racket-mode supports the `#lang` protocol, it can obtain the syntax 
highlighting, indentation, and other support defined by each language, while also 
retaining the global features that all `#lang`-based languages get "for free", 
like a tool to rename variables that respects the actual model of scope. This is one of 
the value propositions of the `#lang` system.

As pointed out by my previous example, this is solved by '-*- ... -*-' too.



Racket has a mechanism to enable additional source file extensions without needing 
explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` 
in a metadata module that is consulted when the collection is "set up": 
https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely 
used.

I guess this is an improvement over the runtime 'guile -x extension'.
However, if I'm understanding 'setup-info.html' correctly, the downside
is that you now need a separate file containing compilation settings.

I have previously proposed a mechanism that makes the '-x' +
'--language' a compile-time thing (i.e., embed the source file extension
in the compiled .go; see previous e-mails in this thread), without
having to make a separate file containing compilation settings.

How is Racket's method an improvement over my proposal?


My focus in this thread is explaining and advocating for `#lang`. I see the 
whole business with file extensions as basically orthogonal to `#lang`, and my 
opinions about it are much less strong, but I'll try to answer your question. I 
think it would make sense for `.go` files to record the file extension of their 
corresponding source files: Racket's `.zo` files do likewise. I don't object to 
a command-line option *at compile-time* (as you said) to enable additional file 
extensions, and I agree that there isn't a huge difference between that and an 
approach with a separate configuration file, though I do find the 
configuration-file approach somewhat more declarative, which I prefer.

'--language whatever' appears pretty declarative to me, as in it declares that the language is 'whatever'.

What I was really trying to argue here is that the file extension should not 
determine the meaning of the program it contains: more on that below.

That's what the '--language whatever' compilation argument is for: it overrides the 'guess by file extension' fallback.

Overall, the experience of the Racket community strongly suggests that a file 
should say what language it is written in. Furthermore, that language is a 
property of the code, not of its runtime environment, so environment variables, 
command-line options, and similar extralinguistic mechanism are a particularly 
poor fit for controlling it.

Agreed on the 'no environment variables' thing, disagreed on the 'no
command-line options'.  In the past e-mails in this thread, there was
agreement on the ‘embed the source file extension in the compiled .go or
something like that; and add -x extension stuff _when compiling_ (not
runtime!) the software that uses the extension’.

Do you any particular issues with that proposal?  AFAICT, it solves
everything and is somewhat more straightforward that Racket.


I don't have particular issues with a compile-time command-line option to 
determine which files to compile. I do object to using command-line options or 
file extensions to determine what language a file is written in.

File extensions are not the worst possible mechanisms, but they have similar 
problems: code written in an unsaved editor or a blog post may not have a file 
extension.

With the proposal I wrote, it remains possible to override any 'file
extension -> language' mapping.  It's not in any way incompatible with
"-*- lang: whatever -*-"-like comments.

Additionally, Guile can only load files that exist (i.e, 'saved'); Guile
is not an editor or blog reader, so these do not appear problems for
Guile to me.


While it's true that the only files Guile can load are "files that exist", it's not true 
that "Guile can only load files": consider procedures like `eval-string`, `compile`, and, 
ultimately, `read-syntax`.
* read-syntax is for reading S-expressions -- it is only for Scheme,
  other languages are out-of-scope for that procedure.  As such,
  read-syntax appears irrelevant here to me.

* For 'compile' and 'eval-string', I'd like to that when the point out
  that they have a "#:from" and #:lang" argument to set the language, as
  you appear to know going by your responses below.  As such, even if
  Guile had an integrated editor, that editor can pass the language to
  Guile's compiler.

  I mean, if the editor is good, it has syntax highlighting, and to do
  syntax highlighting it needs to know the language, so it knows the
  language anyway (e.g. maybe it has separate "Write new Scheme" and
  "Write new ECMAScript" buttons, or maybe it has a 'mode: scheme' and
  'mode: ecmascript' like Emacs and being an editor, it then knows how
  to convert that editor configuration into #:from/#:lang).

* What I meant with 'Guile can only load files that exist',
  is that the files it loads are only those that exist.
  I did not mean that no loadable non-file things exist.

  The point here, is that if you wrote a blog post that defines the
  (foo) module and you enter (use-modules (foo)) in a Guile REPL, it
  isn't going to surf to your blog to download the (foo) module.  As
  Guile doesn't even know about your blog post, it has no use for any
  file extension or language declaration that your blog post about (foo)
  might or might not have.


AFAICT, to the extent that Guile's current implementations of such procedures 
support multiple languages, they rely on out-of-band configuration, like an 
optional `#:language` argument, which is just as extra-linguistic as relying on 
command-line options, environment variables, or file extensions.

First, I never proposed relying on environment variables. I oppose using environment variables for these things. Why are you mentioning environment variables, when this has never been proposed?

Second, the implicit argument here appears to be 'extra-linguistic is bad, so we shouldn't do these extra-linguistic' things. But what's the problem with being 'extra-linguistic'? Some stuff like environment variables are plain bad here (no disagreement here), file extensions are bad to rely on but acceptable and convenient as a fallback.

Third, I am not proposing to rely on command line options and file extension -- I only propose _using_ them, not _relying_ on them -- if someone wants to implement an in-band (intra-linguistic?) override like '-*- ... -*-'/#lang for file-extension based detection, they can do that -- my '-*- ... -*-' is just a proposed improvement over "#lang'.

Fourth, TBC, I'd like to point out that '-*- ... -*-' is equally 'intra/extra-linguistic' as '#!lang' (see my response to 'magic comments' later), though I do know that's not the point you appear to be making right here.

What I'm trying to advocate is that programs should say in-band, as part of 
their source code, what language they are written in.

That's done by '-*- ... -*-' too, and I haven't noticed any argumentation for ‘programs should say in-band what language they are written in’.

Also, there is a gap between the following five statements, which you appear to sometimes be conflating:

  (A) Programs should say in-band what language they are written in.
  (B) ‘Guile should use in-band information to determine what language a
      program is written in.’
  (C) ‘Guile should use out-of-band information to determine what
      language a program is written in.’
  (D) ‘Guile should exclusively use out-of-band information to determine
      what language a program is written in.’
  (E) ‘Guile should exclusively use in-of-band information to determine
      what language a program is written in.’

I disagree with (A), because often it's perfectly clear from context (out-of-band) what language it is.

Take for example Guile itself. Being Guile, of course everything under 'modules/' is Scheme code. Adding '#!r6rs' or '-*- language: scheme -*-' lines to every .scm isn't incorrect, but is rather silly. Likewise, I have written a Scheme library called 'Scheme-GNUnet'. From the name alone, it is clear that it's Scheme. More generally, usually it's pretty clear (for a human) which language it is by just looking at the code, and if not, probably the README mentions which language the software uses.

I don't dispute (B), but neither do I find it particularly important given that adding a '--language=whatever' argument is trivial.

I would like to point out that (A) does not imply (B) -- it is possible to consider it good practice to mention the language in-band, without any language implementations actually using this information.

More to the point, to me (A) appears irrelevant to this thread. Sure, perhaps it's a good practice, but Guile is not a programmer; Guile is a language implementation. (A) is only relevant insofar Guile would make use of this in-band information.

> What I'm trying to advocate is that programs should say in-band, as
> part of their source code, what language they are written in.

This would be advocating for (A). But as mentioned above, (A) is irrelevant by itself, and it doesn't imply (B).

It is also false -- you weren't advocating for (A), but for (B) -- (A) is just a means to (B) in your argumentation structure. Quoting one of your first messages:

To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, 
the ancient & spry" 
(<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>):

On the change side, we need parallel installability for entire languages. 
Racket did a great job facilitating this with #lang and we should just adopt 
that.

You are also advocating for 'E/not (C)':

I do object to using command-line options or file extensions to determine what language a file is written in.

You also appear to be thinking that I'm advocating for '(D)' -- while I agree with (D) (using a non-universal language construct (*) like '#lang' to determine the language something is written in, is rather circular), I'm not arguing for it.

(*) Again, #lang is rather Racket-specific, whereas comments are mostly universal.

If the editor needs to determine the language for syntax highlighting or
such, then there exist constructs like ';; -*- mode: scheme -*-' that
are valid Scheme, but that's not a Guile matter.


See above for why the `#!language/wisp` option is perfectly valid R6RS Scheme

Wisp isn't R6RS. Wisp code needs to be valid Wisp, not valid R6RS Scheme. There also exist languages beyond Wisp and Scheme.

and for some of my concerns about overloading editor configuration to determine 
the semantics of programs.

See above replies.

More broadly, everyone who reads a piece of source code, including humans as 
well as editors and the `guile` executable, needs to know what language it's 
written in to hope to understand it.

For programmers, this is covered by:

  * looking at the code -- even without any explicit in-band information
    like ';; -*- ... -*-' comments or "#lang", or out-of-band
    information like file extensions, a README or Makefile with
    compilation, it usually is pretty clear what language it is in.

  * usually source code is in files, which usually has file extensions.
    Usually there's a good map file extension->language, e.g. .scm files
    only contain Scheme, .js only contains ECMAScript, ...

For editors, this is covered by:

  * Editor configuration like '-*- mode: scheme -*-'.
  * Language-specific declarations like #lang, #!r6rs,
    '-*- programming-language: scheme -*-'
  * File extensions.
  * If the editor guessed wrong, likely the syntax highlighting is
    wrong etc., so the programmer gives a hint to the editor
    (e.g. by adding a -*- mode: scheme -*- line, or #!r6rs, ...)

For the Guile executable, this is covered by:

  * --language=.../#:from/#:lang arguments.
  * -*- ... -*- / #!r6rs lines (but not #lang except when needed for
    compatibility with Racket, otherwise Guile would create
    incompatibilities.)
  * File extensions.
  * Default to Scheme.
  * If guessing wrong, there will almost surely be some parsing error,
    in which case the programmer will intervene by modifying a single
    line in the Makefile or such to add "--language=" line, or if they
    per se want to spend much more time than needed, add
    "-*- programming-language: whatever -*-"  comments to every single
    source file.

(For more on this theme, see the corresponding point of the Racket Manifesto: 
<https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>)
 Actually writing the language into the source code has proven to work well.

What is the corresponding point?  I'm not finding any search results for
'file extension' or 'file name', and I'm not finding any relevant search
results for 'editor'.  Could you give me a page reference and a relevant
quote?


I was trying to refer to section 5, "Racket Internalizes Extra-Linguistic Mechanisms", 
which begins on p. 121 (p. 9 of the PDF). Admittedly, the connection between the main set of 
examples they discuss and this conversation is non-obvious. Maybe the most relevant quote is the 
last paragraph of that section, on p. 123 (PDF p. 11): "Finally, Racket also internalizes 
other aspects of its context. Dating back to the beginning, Racket programs can programmatically 
link modules and classes. In conventional languages, programmers must resort to extra-linguistic 
tools to abstract over such linguistic constructs; only ML-style languages and some scripting 
languages make modules and classes programmable, too." (Internal citations omitted.)

This e-mail thread is about determining the language, not classes and modules. Trying to decode this vague paragraph, the relevant bit here appears ‘must resort to _extra-linguistic_ tools to abstract over such _linguistic constructs_’.

As such, I assume that 'extra-linguistic' refers to file extensions (and other things, but it's the file extensions that are relevant here).
Using this guess to unvaguify the phrasing, I get:

‘Programmers must resort to use file extensions to indicate which language a programmer is written in.’

However, that this is a bad thing appears to be the point that you were making in the first place, for which you gave the PDF as a source, so this doesn't explain anything.

To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, 
the ancient & spry" 
(<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>):


Sorry, this was meant to be tongue-in-cheek, and it seems that didn't come across. 
"Argument from authority" is often considered a category of logical fallacy, 
and ending with a quote is sometimes considered to be bad style or to weaken a piece of 
persuasive writing.

    * I previously pointed out some problems with that proposal
      -- i.e., '#lang whatever' is bogus Scheme / Wisp / ...,

I hope I've explained why something like `#!language/wisp` is perfectly within 
the bounds of R6RS.

No, because Wisp is not R6RS -- R6RS is only relevant insofar the Wisp standard delegates to R6RS. (TBC I'm not claiming that #!language/wisp is invalid Wisp, I'm only claiming that your argumentation has holes here.)

Also, you forgot the '...' in 'Scheme / Wisp / ...' -- while R6RS is somewhat relevant to Wisp, there exist languages over which the R6RS has no sway, e.g. BASIC.

Also, given that Guile already starts with non-standard extensions enabled by 
default, I don't see any reason not to also support `#lang language/wisp`.

Here is a reason for not adding non-standard extensions, from a previous reply of mine:

The '#lang whatever' stuff makes Scheme (*) files unportable between 
implementations, as '#lang scheme' is not a valid comment -- there exist 
Schemes beyond Guile and Racket.  If it were changed to recognising
'-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would be 
better IMO, but insufficient, because (^).
>
> (*) Same argument applies for some, but not all, other non-Scheme languages too.

That Guile might have made some mistakes with non-standard enabled-by-default language extensions in the past, does not mean that it should make more mistakes in the present.

In particular, the spelling of `#lang` proceeds directly from the Scheme tradition. This is from 
the R6RS Rationale document, chapter 4, "Lexical Syntax", section 3, "Future 
Extensions" [3]: [...]

Again, the Scheme tradition holds no sway over non-Scheme languages (except for situations like Wisp, perhaps), e.g. Pascal and BASIC. Guile does not limit itself to Scheme languages, e.g. it has some support for elisp, brainfuck and python (see: python-on-guile).

      and
      'the module system won't find it, because of the unexpected
      file extensions'.


This is indeed something that needs to be addressed, but it seems like a very solvable 
problem. Using the extension ".scm" for everything would be one trivial 
solution. Something like your proposal to enable file extensions based on a compile-time 
option could likewise be part of a solution.

The problem with the 'use .scm for everything' solution is that you would need to use .scm for everything, even non-Scheme files, and even when the source code comes from a project that uses a non-Guile implementation and as such uses very different extensions, e.g. '.js'.

In general, I'll say that, while using Guile, I've often missed Racket's more flexible 
constructs for importing modules. I especially miss `(require "foo/bar.rkt")`, 
which imports a module at a path relative to the module where the `require` form appears: 
it makes it easy to organize small programs into multiple files without having to mess 
with a load path.

I fail to see the relevancy of this comment. Also, 'include' already doe something pretty close to this; presumably 'use-modules' could be modified to accept a #:relative-source-file-name argument:

(define-module (baz)) ; /project/baz.scm
;; -> /project/foo/bar.rkt
(use-modules ((foo bar) #:relative-source-file-name "foo/bar.rkt"))

On Thu, Feb 23, 2023, at 1:42 PM, Maxime Devos wrote:
Have you seen my messages on how the "#lang" construct is problematic
for some languages, and how alternatives like "[comment delimiter] -*-
stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be
equally simple (*) and not have any downsides (**).

(*) The port encoding detection supports "-*- coding: whatever -*-",
presumably that functionality could be reused.


IMO, the use of  "-*- coding: whatever -*-" to detect encoding is an ugly hack 
and should not be extended further.

I tried to raise some objections above to conflating editor configuration with 
syntax saying what a file's language is.

More broadly, I find "magic comments" highly objectionable. The whole point of comments 
is to be able to communicate freely to human readers without affecting the 
interpreter/compiler/evaluator. Introducing magic comments means must constantly think about 
whether what you are writing for humans might change the meaning of your program. Magic comments 
*without knowing a priori what is a comment* are even worse: now, you have to beware of accidental 
"magic" in ALL of the lexical syntax of your program. (Consider that something like 
`(define (-*- mode: c++ -*-) 14)` is perfectly good Scheme.)

I object to the second claim -- while I can't account for aliens given the lack of them, I find it pointless to restrict the purpose of comments to human animals.

The third and penultimate claim are false. If implemented correctly in Guile, only the first language declaration counts, it's not 'ALL of the lexical syntax of your program'.

You previously claimed that programs should contain in-band information on which language something is written in. If this is followed, your example would actually look like:

;; -*- programming-language: scheme -*-
;; ^ or mode: c++, or #!r6rs, or an out-of-band --language=..., ...
(define (-*- mode: c++ -*-) 14)

As the relevant '-*- ...: scheme -*-' precedes the irrelevant '-*- mode: c++ -*-', it's the relevant one that is picked up by Guile, not the irrelevant one.

As such, as long as the programmer uses the '--language=' compilation option in the Makefile, or puts a 'real' language declaration in the beginning of the source file (as a 'magic comment', or #!r6rs, or #lang as far as required for compatibility with Racket), things will work out.

Even if the programmer doesn't do any of that, it's still unproblematic, because of error messages at compilation / interpretation time -- different languages tend to have incompatible syntax, if you pass a Scheme program to a C++ parser you'll just get a stream of syntax errors.

Surely, the programmer will pass the code to the compiler or interpreter at some point, right? Otherwise, the programming was pointless. Likewise, test suites (ought to) exist, which would catch these problems even if they weren't written to catch these problems. (If they don't exist, then the programmer has much worse problems than a super implausible '(define (-*- mode: c++ -*- 14)' situation.)


(It's not really relevant for the `#lang`-like case, but something I find especially ironic about encoding 
"magic comments" or, say, `<?xml version="1.0" encoding="UTF-8"?>`, is that 
suddenly if you encode the Unicode text in some other encoding it becomes a lie.)

That sounds exactly the same situation as with #lang to me (and, as such, relevant). If you take a Scheme file

  #scheme
  ; ^ equivalent of <?xml version="1.0" encoding="UTF-8?>"
  [...] ; <- Scheme code

and then convert it to Wisp, but forget to adjust the "#lang":

  #scheme
  ; ^ equivalent of <?xml version="1.0" encoding="something-else"?>
  [...]; <-- Wisp code

then you'll get a bunch of syntax errors.


On Fri, Feb 24, 2023, at 6:51 PM, Maxime Devos wrote:
On 25-02-2023 00:48, Maxime Devos wrote:
(**) For compatibility with Racket, it's not like we couldn't
implement both "#lang" and "-*- stuff: language -*-".

TBC, I mean ‘only support #lang' for values of 'lang' that Racket
supports’

If I understand what you're proposing here, I don't think it's a viable option.

The fundamental purpose of the `#lang` construct (however you spell it) is to provide an 
open, extensible protocol for defining languages. Thus, "values of 'lang' that 
Racket supports" are unbounded, provided that a module has been installed where the 
language specification says to look. From The Racket Reference [4]:

The problem, as I wrote several times previously in different words, is that this 'open, extensible protocol' is not a standard protocol shared between languages. No language that precede the existence of Racket acknowledges this protocol in its specification of its syntax, and, like I said before, if the language doesn't have "#" comments, then #lang is also contrary to the syntax of the language.

Like I wrote about R6RS: Racket only holds sway over Racket; it has no authority on the syntax of, say, BASIC and Pascal.

Also, being unbounded in not a problem, because unbounded!=infinite. At any point in time, Racket itself only supports a finite number of 'values of 'lang'', and at any point at time there are only a finite number of external modules that implement certain 'lang'.

As such, at any version of Guile, Guile could have a finite list of 'lang' where it recognises the Racket-specific extension #lang extension which is incompatible with non-Racket, non-Guile implementations.

[...]
I am definitely **not** suggesting that Guile implement all the details of 
Racket's `#lang` implementation. What I do strongly advocate is that you design 
Guile's support for `#lang` (or `#!`) to leave open a pathway for compatibility 
in the future. [...]

The problem with this advocating, is that I agree with you here (except for 'you design' (*)), so why are you repeating this again? I wrote something among the lines ‘For __compatibility__ with Racket, __#lang should be recognised for values of 'lang' that are recognised by Racket__, but not for other languages’ (emphasis added).

(*) Sure, someone could implement this compatibility, whatever, but we don't need this compatibility for Wisp. For Wisp, the more general and less problematic 'embed source file name in .go, + --language/file extension guessing' suffices. It's also rather pushy -- _you_ are demanding that _I_ paper over a source of incompatibility _introduced by others_ (Racket) (and furthermore _I_ consider that source of incompatibility _bad_), in the ML of a _volunteer project_, in a discussion that's ultimately about Wisp, not Racket, where _I_ (**) already have voluntarily designed a solution for Wisp?

(**) And others maybe, I don't recall how much can be attributed to whom.

> [...]
(Other kinds of potential namespace collisions are easier to manage: for 
example, we could imagine that `(use-modules (foo bar baz))` might not access 
the same module as `(require foo/bar/baz)`. [...]

This is interesting but seems completely orthogonal; this e-mail thread is about detecting which language something is in, and finding source files with non-.scm modules, not about making the module system non-global.

> [...]
I've sort of alluded above to my pipe dream of a grand unified future for 
Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and endless other 
possibilities. I wrote about it in more detail on the guix-devel list at [10]. 
(These thoughts were inspired by conversations with Christine Lemmer-Webber, 
though she bears no responsibility for my zany imaginings.)

OK, but what has this to do with this e-mail thread? This e-mail thread is about supporting additional languages, not about emulating Racket on top of Guile somehow (or perhaps you count Racket's dialect of Scheme as a language of its own to be implemented in Guile?).

Finally, I looked into the history of `#!` in R6RS a bit, and I'll leave a few pointers here for posterity. Will 
Clinger's 2015 Scheme Workshop paper [11] says in section 3.1 that "Kent Dybvig suggested the `#!r6rs` flag in May 
2006", Clinger "formally proposed addition of Dybvig’s suggestion" [12], and, "less than six weeks 
later," `#!r6rs` was "in the R6RS editors’ status report". (I am not persuaded by all of the arguments 
about `#!r6rs` in that paper: in particular, the analysis doesn't seem to account for R6RS Appendix A [1].) As best as 
I can tell, the suggestion from Kent Dybvig is [13]:

Again, how is RnRS relevant to _non-Scheme_ languages?

Besides the 'shebangs actually are r6rs', I am disappointed by this discussion -- you keep repeating irrelevant points or points that were already addressed. (Again, R6RS and Racket are simply _irrelevant_ to non-Scheme languages that did not originate from Racket, and you are not giving arguments for them actually being relevant somehow.)

As this line of discussion has proven to just be a pointless time sink, I will not read or respond to further replies by you in this line of discussion.

Greetings,
Maxime

Attachment: OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]