|
From: | Koichi Murase |
Subject: | Re: bash "extglob" needs to upgrade at least like zsh "kshglob" |
Date: | Sun, 20 Nov 2022 21:50:54 +0900 |
2022年11月18日(金) 2:11 Chet Ramey <chet.ramey@case.edu>: > "If a pattern ends with an unescaped <backslash>, it is unspecified whether > the pattern does not match anything or the pattern is treated as invalid." > > Bash uses the former interpretation. If "the pattern is treated as invalid" > means trying to literally match the open bracket and going on from there, > your interpretation is valid as well. The standard doesn't use that > language in other places it specifies to treat the bracket as an ordinary > character to be matched literally, however. There seem to be still remaining issues. It is fine for me if Bash chooses the former, ``the pattern does not match anything'' with a backslash followed by NUL, but the following cases (see the attached [reduced3.sh]) with a backslash followed by a slash should still be fixed: #1: pat=a[b\/c] str=a[b/c] no/yes #2: pat=a[b\/c] str=ab no/no #3: pat=a[b\/c] str=ac yes/no [...] Where the fourth column <xxx/yyy> shows the result of the current devel 407d9afc with FNM_PATHNAME (xxx) and the result I expect (yyy). "yes" means the pattern matches the string, and "no" means the pattern does not match. * I expect "yes" for #1 because the bracket expression contains a slash before its closing right bracket `]' and thus the beginning `[' should be matched literally. However, the actual behavior is "no". * I expect "no" for both #2 and #3 because the beginning bracket `[' should be matched literally. Even when an escaped slash would be allowed in the bracket expression so that [b\/c] forms a complete bracket expression, the results of #2 and #3 being "no" and "yes", respectively, are inconsistent. This difference is caused because the slash after the backslash is only checked after a matching character is found (lib/glob/sm_loop.c:703). The same check should be applied also before a matching character is found (lib/glob/sm_loop.c:573). I attach a patch for this [r0037.brackmatch6.remaining-slash.patch]. ---------------------------------------------------------------------- There is another related inconsistency. I just modified my new extglob engine to follow Bash's choice described above, but then the behavior became different from that of the actual implementation of Bash of the current devel. > "If a pattern ends with an unescaped <backslash>, it is unspecified whether > the pattern does not match anything or the pattern is treated as invalid." > > Bash uses the former interpretation. The corresponding sentence in the POSIX standard describes the unescaped backslashes in the general context of the pattern instead of that in the bracket expression, so I applied this to the new extglob engine. However, ``the former interpretation'' that Bash adopts turned out to be only applied to the unescaped backslashes *inside a bracket expression*. This is the remaining part of the output of the attached [example3.sh] with the current devel 407d9afc: [...] #4: pat=a\ str=a\ yes/??? So the pattern terminated with unescaped backslash actually matches a string, where the backslash is treated as a literally-matching backslash. a. Is this difference between outside and inside of the bracket expressions intensional? I.e., the former interpretation "the pattern does not match anything" seems to only apply to the inside of bracket expressions. b. If this is the behavior for the unescaped backslashes outside the bracket expressions, which is intensionally different from those in the bracket expressions, would it be possible to change the treatment of the unescaped backslashes inside the bracket expression the same as that of outside so the bracket `[' matches literally (as expected in cases #28..#31 of my previous reply [1])? The attached [r0037.brackmatch7.unescaped-backslash-option-b.patch] is the corresponding patch. [1] https://lists.gnu.org/archive/html/bug-bash/2022-11/msg00070.html c. If the behavior of the unescaped backslash of the outside should also be modified to follow the former interpretation "the pattern does not match anything", another patch is [r0037.brackmatch7.unescaped-backslash-option-c.patch]. However, the current behavior outside the bracket expression seems to be explicitly required by the tests on tests/glob2.sub:32 and tests/glob2.sub:41. I prefer option b, which keeps the behavior required by tests/glob2.sub and also consistent between the inside and the outside of bracket expressions. It is also consistent with the behavior for the string end inside bracket expressions. -- Koichi
r0037.brackmatch6.remaining-slash.patch.txt
Description: Text document
example3.sh
Description: Text Data
r0037.brackmatch7.unescaped-backslash-option-b.patch.txt
Description: Text document
r0037.brackmatch7.unescaped-backslash-option-c.patch.txt
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |