[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: any plans for command substitution that preserves trailing newlines?
From: |
Christoph Anton Mitterer |
Subject: |
Re: any plans for command substitution that preserves trailing newlines? |
Date: |
Wed, 26 Jan 2022 00:31:39 +0100 |
User-agent: |
Evolution 3.42.2-1 |
Hey.
Coming back to that topic,... mostly for the records (if anyone else
should ever stumble over this).
The following was pointed out[0] on another mailing list, namely that:
Using . or / as sentinel value should be generally fine (even with out
setting LC_ALL=C), as POSIX requires:
- The encoded values associated with <period>, <slash>, <newline>, and
<carriage-return> shall be invariant across all locales supported by
the implementation.”
=> which means AFAIU, that these will have the same binary
representation in any locale/encoding.
- Likewise, the byte values used to encode <period>, <slash>,
<newline>, and <carriage-return> shall not occur as part of any
other character in any locale.”
=> which means AFAIU that it cannot happen, that a invalidly
encoded character + the sentinel form together a valid character
and thus the sentinel cannot be stripped of, as no partial byte
sequence could be completed by these bytes/characters to a valid
character in any locale/encoding.
(see 6.1 Portable Character Set [1])
So if that holds true... simply appending . or / as sentinel within the
command substitution, and removing that afterwards (without any need
for locale changes) should *always* work, regardless of the
locale/encoding.
Can anyone confirm this?
@Koichi, with respect to your replies back then (especially your
comments about ISO/IEC 2022):
On Tue, 2021-06-01 at 11:55 +0900, Koichi Murase wrote:
> It seems the solution is also given there; set temporary LC_ALL=C
> (though it is pointed out that this doesn't work with yash).
I found several more shells that seem to not support changing LC_ALL
during runtime (at least without effect for the shell itself): [2], [3]
> There is no problem in UTF-8 where "x" will never appear as a valid
> trailing byte in multibyte characters.
But AFAIU, command substitution is defined to capture any stdout (i.e.
also invalid encoded stuff), except for NUL and trailing newlines.
So UTF-8 itself has no problem, but there is no guarantee, that the
command must generate only valid UTF-8.
> but "." isn't
> affected (as far as the answering person tried in Debian, FreeBSD,
> and
> Solaris), but this is not really a robust statement.
It became more robust not with what Thorsten Glaser pointed out.
However, I have no idea how these POSIX requirements relate with
respect what you wrote back then:
> In theory,
> ISO/IEC 2022 encoding allows to change the meaning of C0 (\x00-\x1F),
> GL (\x21-\x7E), C1 (\x80-\x9F), and GR (\xA0-\xAF) by locking shift
> escape sequences. In particular, all the bit combinations (i.e.
> bytes)
> in GL which contain ASCII "." and "x" can be used for trailing bytes
> of 94^n character sets (such as LC_CTYPE=ja_JP.ISO-2022-JP). The only
> two bit-combinations that are unaffected by the ISO/IEC 2022 shifts
> are SP (space \x20) and DEL (^? or \x7F). But actually, the encodings
> that are fully ISO/IEC 2022 have hardly used as user locales because
> most utilities have problems in dealing with such context-dependent
> encoding schemes.
Would that "shifting" simply not be allowed in a POSIX compliant
shell/locale/encoding?
Cheers,
Chris.
[0] https://lists.zytor.com/archives/klibc/2022-January/004659.html
[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html
[2] https://lists.zytor.com/archives/klibc/2022-January/004657.html
[3]
https://lore.kernel.org/dash/e312d45e17b49c418c3a62a56da758977067b563.camel@scientia.org/T/#u
- Re: any plans for command substitution that preserves trailing newlines?,
Christoph Anton Mitterer <=
- Re: any plans for command substitution that preserves trailing newlines?, Christoph Anton Mitterer, 2022/01/25
- Re: any plans for command substitution that preserves trailing newlines?, Chet Ramey, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Christoph Anton Mitterer, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Alex fxmbsw7 Ratchev, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Alex fxmbsw7 Ratchev, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Chet Ramey, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Christoph Anton Mitterer, 2022/01/26
- Re: any plans for command substitution that preserves trailing newlines?, Chet Ramey, 2022/01/26