[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #65108] [troff] support construction of general file name request a
From: |
G. Branden Robinson |
Subject: |
[bug #65108] [troff] support construction of general file name request arguments |
Date: |
Thu, 18 Jul 2024 17:54:14 -0400 (EDT) |
Update of bug #65108 (group groff):
Status: None => Need Info
Assigned to: None => barx
_______________________________________________________
Follow-up Comment #3:
Well, let's rough out a syntax that would work both for existing uses of `so`
as _soelim_(1) understands it and for formatter syntax, which interprets the
`so` under slightly different rules (since it brings to bear the full power of
the _troff_ lexical analyzer).
1. An argument of type `file` (as described in _groff_(7)) to a request
consumes the rest of the rest of the line.
2. Unescaped spaces can therefore populate the argument.
3. A leading double quote is recognized and removed; a file name can thus
start with spaces.
4. Any other/remaining double quotes are not treated specially.
5. Only the following escape sequences are recognized.
5a. `\ ` (backslash-space) represents a space. It is not necessary in
_troff_, but is recognized to avoid disrupting existing _soelim_(1) usage.
5b. `\"` ends the file name argument and starts a comment.
5c. `\\` represents a (single) literal backslash. It is handled however the
system's standard C library wants to handle it.
5d. `\[u00XX]` where each X is an uppercase hexadecimal digit encodes a
character. Only codes in the range 00-1F and 80-FF are accepted in this
syntax; those in the range 20-7F are ignored with a diagnostic advising the
user to deobfuscate their inputs.
How are these handled today?
Specimen:
$ cat EXPERIMENTS/extending-so-syntax.troff
.so foo bar file.troff
.so foo\ bar\ file.troff
.so "foo bar file.troff
.so foo.troff\" comment
.so foo\u[0020]bar\u[0020]file.troff
_groff_ _soelim_:
$ soelim EXPERIMENTS/extending-so-syntax.troff
.lf 1 ./EXPERIMENTS/extending-so-syntax.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:1: error: can't open 'foo': No
such file or directory
.so foo bar file.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:2: error: can't open 'foo bar
file.troff': No such file or directory
.so foo\ bar\ file.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:3: error: can't open '"foo': No
such file or directory
.so "foo bar file.troff
.so foo.troff\" comment
.so foo\u[0020]bar\u[0020]file.troff
DWB 3.3 _soelim_:
...never mind, DWB 3.3 _troff_ *has* no _soelim_. Wow! Learned something new
today.
Heirloom Doctools _soelim_:
$ ./bin/soelim ./extending-so-syntax.troff
foo: No such file or directory
.so foo
bar file.troff
foo\: No such file or directory
.so foo\
bar\ file.troff
"foo: No such file or directory
.so "foo
bar file.troff
foo.troff\": No such file or directory
.so foo.troff\"
comment
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff
Uh, that's a little hard to interpret.
$ printf '.so foo bar file.troff\n' | ./bin/soelim
foo: No such file or directory
.so foo
bar file.troff
Interesting that it transforms the input in this way, by adding a newline
where it decided to stop lexing the file name. I'm tempted to call that a
bug.
0000000 . s o f o o \n b a r f i l e
0000020 . t r o f f \n
0000026
The other cases:
$ printf '.so foo\\ bar\\ file.troff\n' | ./bin/soelim
foo\: No such file or directory
.so foo\
bar\ file.troff
$ printf '.so "foo bar file.troff\n' | ./bin/soelim
"foo: No such file or directory
.so "foo
bar file.troff
$ printf '.so "foo.troff\\"comment\n' | ./bin/soelim
"foo.troff\"comment: No such file or directory
.so "foo.troff\"comment
$ printf '.so foo\u[0020]bar\u[0020]file.troff\n' | ./bin/soelim
printf '.so foo\\u[0020]bar\\u[0020]file.troff\n' | ./bin/soelim
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff
There seem to be no further surprises here.
Unix V7 did not have _soelim_, either.
Let me check Solaris 10.
$ printf '.so foo\\ bar\\ file.troff\n' | soelim
foo\: No such file or directory
.so foo\
bar\ file.troff
$ printf '.so "foo bar file.troff\n' |soelim
"foo: No such file or directory
.so "foo
bar file.troff
$ printf '.so "foo.troff\\"comment\n' |soelim
"foo.troff\"comment: No such file or directory
.so "foo.troff\"comment
$ printf '.so foo\u[0020]bar\u[0020]file.troff\n' |soelim
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff
These look identical to Heirloom to me. I guess we know now where Heirloom
got its inspiration, and perhaps even code, for _soelim_ from.
Since backslash-space is apparently a GNU extension in the first place, we
might consider dropping it. It wasn't portable, and even the rest of the
_groff_ ecosystem struggled to handle files with spaces in their names.
I further venture that this exact same syntax could be applied to the
`sy`/`pso` problem in bug #62787 and to user-constructed diagnostic messages
in bug #64071.
I highly value the prospect of having a parallel syntax for these 3 issues if
we can get it.
For _soelim_(1) itself I would further add that this program will continue to
recognize only backslash as an escape character, but GNU _troff_ will
recognize the configured escape character.
Thoughts?
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?65108>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature