Re: ${b+s ''}

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ${b+s ''}

From:	Robert Elz
Subject:	Re: ${b+s ''}
Date:	Thu, 21 Feb 2019 01:39:58 +0700

    Date:        Wed, 20 Feb 2019 10:08:44 -0500
    From:        Chet Ramey <chet.ramey@case.edu>
    Message-ID:  <ca6e6aac-4c33-70c0-2244-8ec75bfbab4c@case.edu>

  | Maybe. The standard doesn't actually say that anywhere as such.

Not in the words I used, but XCU 2.3 does say:

        When it is not processing an io_here, the shell shall break
        its input into tokens by applying the first applicable rule
        below to the next character in its input. The token shall be
        from the current position in the input until a token is delimited
        according to one of the rules below; the characters forming the
        token are exactly those in the input, including any quoting
        characters.

The quoting chars are removed as part of quote removal, until then,
the abstract model requires that they be retained (as nothing else
deletes them from the token).   They can of course be dropped (alongt
with any other chars) if they're not used, eg< in

        unset b; echo ${b+s ''}

since b is unset there, the "s ''" part isn't used for anything, and is
all simply dropped as part of the expansion.   But ignoring that kind of
use (or non-use) the quotes are (in the abstract model) retained, exactly
where they were, until quote removal drops them.

Almost no-one actually implements it like that (perhaps no-one) as
following the model, exactly as written, means rescanning things over
and over again, for no real gain - once you've scanned a command
substitution, or parameter expansion, once (which is necessary to
correctly find the end of the token) there's no point ever actually
doing all that again when the word is being expanded, nothing is going
to have changed.   Similarly, we need to know which quote chars were
in the input token as written (those are the ones that are removed by
quote removal) and distinguish those from ones that resulted from an
expansion (which don't get removed) - aside from saying that that needs
to happen, the model gives no hint as to how.   All this means that the
internal form of a token generally looks nothing at all like the model
suggests that it should, which is all fine - provided that the end
result works out properly.

  | But since I wasn't talking about lexical analysis, but word expansion,

That should be unrelated, word expansion doesn't ($@ aside) consider
quote characterss at all.   They're just data.  Thay later affect
pathname expansion (by hiding * ? and [ or ]) and prevent field splitting
inside a quoted part, but word expansion itself should not concern itself
with quoting (again, other than $@).   (Expanding $* depends upon whether
field splitting is going to happen - one of the preventers of which is
quotes, but that's just one, so quotes there are kind of a side issue.
A simple way of doing it is to simply shove IFS[0] between the $1 $2 ...
values, and then let field fplitting (if it happens) remove it again -
in which case quotes and $* expansions are completely unrelated).

  | Obviously we don't discard these null strings during lexical analysis.

Yes, I knew some of the examples were kind of wonky, but I kept them
to make the point that null strings aren't just noise (ever), they are
real data, and deserve to be retained, until the quotes are finally
removed, and then if the "nothing" that is inside is an object of its
own, it remains visible (the quotes are removed, the word, containing
nothing, remains), if it abuts something else, so it is impossible
to tell if it is there or not, it effectively vanishes.   BUt that
should only be happening as part of quote removal.

  | We have to remember them through word expansion so that if a word
  | containing an empty quoted string expands to nothing, [...]

Exactly, which is exactly what (becomes) the second word in the expansion
of
        ${b+s ''}

is (when b is set, and assuming a normal, or normalish, IFS).

  | It's less clear that we have to remember
  | them as part of a non-null word through word splitting.

It is a consequence of the quotes not being removed until quote
removal happens.   Until then, the
        ${b+s ''}
expands to
        s ''
which is field split into two words, "s" and "''" -- the double
quyotes are just for exposition in this e-mail, or perhaps represent
the strings that the shell should (nominally) have internally.

The first of those is just an 's' character (as a string), the second word
(after quote removal) becomes the empty word, which needs to remain, because
quote removal never deletes (or creates) words.

  | But compatibility is important, so I'll take a look at what bash is doing
  | here.

Thanks.   That is all anyone can ask.

I'd also note, just to save a later e-mail (perhaps) that it is
astoundingly difficult to imagine that anyone would write

        ${b+s ''}

(or anything like it) if they did not intend that '' to mean something.

That is, it is extraordinarily unlikely that changing the way bash
works here is going to affect anything -- no-one is going to be relying
upon that null string vanishing - if it wasn't wanted, they would
just write
        ${b+s}
which is what bash generates from the above now.   Or if they're
perverse, perhaps
        ${b+ s }
which field splitting (with a normal IFS) turns into the same thing...

kre

[Prev in Thread]

Current Thread

[Next in Thread]

${b+s ''}, sunnycemetery, 2019/02/16
- Re: ${b+s ''}, Chet Ramey, 2019/02/17
  - Re: ${b+s ''}, Greg Wooledge, 2019/02/19
    - Re: ${b+s ''}, sunnycemetery, 2019/02/19
  - Re: ${b+s ''}, Robert Elz, 2019/02/19
- Re: ${b+s ''}, Robert Elz, 2019/02/19
  - Re: ${b+s ''}, Chet Ramey, 2019/02/20
  - Re: ${b+s ''}, Robert Elz <=

Prev by Date: Re: tab and arrow keys
Next by Date: Re: bash 5.0 dies with HISTSIZE=0
Previous by thread: Re: ${b+s ''}
Next by thread: Small documentation error
Index(es):
- Date
- Thread