[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep-2.10 testing
From: |
Jim Meyering |
Subject: |
Re: grep-2.10 testing |
Date: |
Sun, 20 Nov 2011 18:07:55 +0100 |
Bruno Haible wrote:
>> grep '\<$e_acute' in > out 2>err || fail=1
>
> Single-quotes, not double-quotes, around a reference to a shell variable??
Thanks for catching that.
Now, to see if the intended fix actually solves the problem...
No.
Stepping through that test manually,
(and what I should have done in the first place)
I see this:
openbsd$ e_acute=$(printf '\303\251')
openbsd$ echo "$e_acute" > in || framework_failure_
openbsd$ LC_ALL=en_US.UTF-8
-bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
openbsd$ export LC_ALL
So the real problems lies elsewhere.
You could argue that the require_en_utf8_locale_
function, run just prior, should have detected that the
desired locale is not available.
However, it runs a little helper program like this:
openbsd$ tests/get-mb-cur-max en_US.UTF-8
4
which, since it prints a number in [3-6], does suggest that
the locale exists.
Debugging it, the first symptom I found is that
grep's DFA transition table is different on *BSD systems.
I first noticed in dfa.c's dfaexec:
# With s=0 and *p=195 (aka \303)
(gdb-openbsd) p trans[s][*p]
$1 = 3
On other systems, that value is 0.
Why the difference?
To answer that, you have to peer into build_state_zero->build_state->dfastate,
which initializes like this:
/* Return non-zero if C is a `word-constituent' byte; zero otherwise. */
#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
...
if (! initialized)
{
initialized = 1;
for (i = 0; i < NOTCHAR; ++i)
if (IS_WORD_CONSTITUENT(i))
setbit(i, letters);
setbit(eolbyte, newline);
}
The *BSD locale tables (i.e., what isalnum, isalpha, etc. use) are different.
They classify \303 (among others) as alphabetic, while other systems do not.
For the record, here are the <byte,boolean> isalpha pairs on OpenBSD 4.9.
Note how there are many '1's after 127. There are none with glibc.
0 0 32 0 64 0 96 0 128 0 160 0 192 1 224 1
1 0 33 0 65 1 97 1 129 0 161 0 193 1 225 1
2 0 34 0 66 1 98 1 130 0 162 0 194 1 226 1
3 0 35 0 67 1 99 1 131 0 163 0 195 1 227 1
4 0 36 0 68 1 100 1 132 0 164 0 196 1 228 1
5 0 37 0 69 1 101 1 133 0 165 0 197 1 229 1
6 0 38 0 70 1 102 1 134 0 166 0 198 1 230 1
7 0 39 0 71 1 103 1 135 0 167 0 199 1 231 1
8 0 40 0 72 1 104 1 136 0 168 0 200 1 232 1
9 0 41 0 73 1 105 1 137 0 169 0 201 1 233 1
10 0 42 0 74 1 106 1 138 0 170 1 202 1 234 1
11 0 43 0 75 1 107 1 139 0 171 0 203 1 235 1
12 0 44 0 76 1 108 1 140 0 172 0 204 1 236 1
13 0 45 0 77 1 109 1 141 0 173 0 205 1 237 1
14 0 46 0 78 1 110 1 142 0 174 0 206 1 238 1
15 0 47 0 79 1 111 1 143 0 175 0 207 1 239 1
16 0 48 0 80 1 112 1 144 0 176 0 208 1 240 1
17 0 49 0 81 1 113 1 145 0 177 0 209 1 241 1
18 0 50 0 82 1 114 1 146 0 178 0 210 1 242 1
19 0 51 0 83 1 115 1 147 0 179 0 211 1 243 1
20 0 52 0 84 1 116 1 148 0 180 0 212 1 244 1
21 0 53 0 85 1 117 1 149 0 181 1 213 1 245 1
22 0 54 0 86 1 118 1 150 0 182 0 214 1 246 1
23 0 55 0 87 1 119 1 151 0 183 0 215 0 247 0
24 0 56 0 88 1 120 1 152 0 184 0 216 1 248 1
25 0 57 0 89 1 121 1 153 0 185 0 217 1 249 1
26 0 58 0 90 1 122 1 154 0 186 1 218 1 250 1
27 0 59 0 91 0 123 0 155 0 187 0 219 1 251 1
28 0 60 0 92 0 124 0 156 0 188 0 220 1 252 1
29 0 61 0 93 0 125 0 157 0 189 0 221 1 253 1
30 0 62 0 94 0 126 0 158 0 190 0 222 1 254 1
31 0 63 0 95 0 127 0 159 0 191 0 223 1 255 1
- Re: grep-2.9.69-f91c on OSF/1, (continued)
- Re: grep-2.10 testing (was: grep-2.9.69-f91c testing), Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Message not available
- Re: grep-2.10 testing,
Jim Meyering <=
- Re: grep-2.10 testing, Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing, Bruno Haible, 2011/11/21
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21