[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "$@" expansion when it is consists of only null strings

From: Robert Elz
Subject: Re: "$@" expansion when it is consists of only null strings
Date: Fri, 01 Mar 2019 02:32:59 +0700

    Date:        Mon, 25 Feb 2019 17:38:07 -0500
    From:        Grisha Levit <address@hidden>
    Message-ID:  <address@hidden>

First, apologies from me for missing this message from you.   I don't
know if my spam filters caught it (for some unknown reason) or whether
it was delivered and I simply discarded it without noticing that it was
real mail and not an example of spam that made it past my spam filters.
I have cleaned out my spam folder between when you sent this and now,
so I can't simply go look there to find out (if it had been filtered,
it would have gone there, but if it was my eyeballs failing, then it
would have just been removed ...)

I had no idea your reply existed until I saw Chet's reply to you, and
went hunting (fortunately I keep many backups of all mail received,
including spam, so I could go hunting, using the referenced message-id
in his message, to find where your message appeared, and recover it.)

  | Yup I was referencing the devel version that fixed the ${b+s ''} issue.

Oh.  I didn't know there had been a fix for that.   Great.

  | I think I'm missing something but how can that be the case regarding
  | the quoting?

It all relates to the way the original Bourne shell (1978/9 vintage,
on a pdp-11 ... so very little code or data space) parsed quoting.

The rule was basically very simple, each (unquoted) " (and same for ')
in the input byte stream simply toggled the "quoted" flag - and this was
done very early in the input data processing.

Every other character (except \ or course, which simply quoted the following
char) was combined with the "quoted" flag to form the character that existed
internally (the "quoted" flag was 0x80, and "combined" meant (ch|quoted).
(The few chars with special meaning insiode "" were handled differently,
but I don't remember exactly how right now.)

The effect of this is that a quoted '*' for example would not compare equal
to an unquoted '*' (one is 0x2A, the other 0xAA), so quoting the magic
chars for glob simply worked (glob looks only for 2A, an unquoted '*').
When comparing normal chars for equality the quote flag was simply stripped
(if ((ch & 0x7f) == *p) ... or similar).

Similarly, IFS can only contain unquoted characters (quote removal
happens -- which at the time that meant &= 0xFF -- before the value is
assigned), so field splitting never worked on a quoted char, only on an
unquoted one.

This allowed all kinds of simplifications to the code, so it could be
implemented in very little space.    Unfortunately it also meant that
we ended up with the weird spec for "${var+w"or"d}" where the "or" are
not quoted, but the 'w' and 'd' are - it is also what leads to
"${var+w'or'd)}" (when var is set) expanding to "w'or'd"  as the '
characters are still quoted by the "" and are not quoting chars themselves.

When Korn first implemented ksh (so I am told, I have never seen a ksh this
old) he fixed that, and made the quoting context for word in those 4
expansions, as well as the new substring extraction expansions he invented,
be unrelated to the quoting context surrounding the expansion.   But, so
I have been told, we was convinced that was incompatible, and so changed
the 4 original operators - + = and ?, back, so they were processed the
same way that the original Bourne shell did them, but left the substring
operators (% %% # and ##) the new way.

Until relatively recently, that was the supposed state of the world, and
what POSIX demanded.

But in the interim, several shells have been convinced by their users
(or never understood in the first place perhaps for some of them) that
it is (was) supposed to work this way, and made the implementation be
more like what users expect, rather than what the original Bourne shell

Eventually POSIX, which is supposed to be telling script writers what
they should expect to work in the wild, had to relent, and make this be
an unspecified case, as ...

  | For example "${x+" a   b "}" expands to a single field in
  | bash/dash/yash/zsh/netbsd sh (though not in ksh..)

is simply reality - the FreeBSD shell is in the ksh camp (they have a
very POSIX conforming implementation, modulo the occasional bug of course).

The effect is that if you want to write portable code, you simply cannot
put quotes inside a quoted variable expansion using any of the older 4
operators (+ =- = ?) ... but with anything newer it works fine.

And incidentally, after more research, I can no longer justify:

  | > The second because there's no real agreement whether it should produce
  | > 0 or 1 (different shells do different things for that one, and there's
  | > no particularly good argument for one or the other, so posix, I believe,
  | > makes that one unspecified as well.)

There's nothing I can find which makes that happen, it appears that the
standard still expects nothing (no fields) to result from


when X is set, and there are no positional parameters, despite almost
no shells (not even FreeBSD) implementing it that way.   So that is another
thing that will change - it will end up being unspecified as well (it
is a fairly useless idiom to use, ${X+"$@"} is much more sane, and if you
actually want a null string result, rather than nothing, simply append
(or prepend) one, as

  | Looks like when there are one or more positional parameters (and x
  | unset) all shells listed above expand "address@hidden" to the proper number
  | of fields,

Yes, when there are positional params, it is easy.   It is whether or
not the field should be deleted when there are none that is the harder
case - shells can recognise "$@" and cause nothing to result if there
are no params, but when the $@ is buried somewhere else it is much harder.

  | > but that's because they don't
  | > implement $_ (and good on them for that, stupid thing it is)
  | Sorry, didn't mean to confuse the issue by using _, should've used a
  | more portable example.

Not a problem.   For future similar cases, you can use $0 instead of $_ -
$0 is always set, so while writing ${0+whatever} in a real script would be
lunacy, for running sh tests of something where the parameter is known
to be set it works just fine, and avoids needing to use extra code to make
sure the variable used is set. ($? and $$ would also work, but are more likely
to confuse the reader, wondering just what magic you're trying to achieve!)

In his reply, address@hidden said:
  | There is an interpretation that somewhat decoupled quotes outside the
  | expansion with quotes inside it, but I can't remember the specifics right
  | now. It might be 888, but I seem to remember another.

It's not 888 (that's just $@), and if it was, it would be in the text now,
and it isn't.  But it has been done in something newer (decouple, as in
"made unspecified") so now we're all free to implement sanity.   One day
perhaps even ksh will change again, and we might then be able to standardise
sanity - but that will be decades away (most likely I won't live long enough
to see that happen, I don't have that many decades left!)

address@hidden also said:
  | Because ksh uses the open, open, close, close interpretation.

Actually just the opposite, that's what sane implementations do (new
quoting context after any operator in a ${var<op>word} expansion, not
just any except the original 4), ksh (ksh93 anyway, mksh doesn't) is
doing the old open close open close interpretation.  So does the FreeBSD


reply via email to

[Prev in Thread] Current Thread [Next in Thread]