bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep-2.10 testing


From: Jim Meyering
Subject: Re: grep-2.10 testing
Date: Sun, 20 Nov 2011 18:07:55 +0100

Bruno Haible wrote:
>> grep '\<$e_acute' in > out 2>err || fail=1
>
> Single-quotes, not double-quotes, around a reference to a shell variable??

Thanks for catching that.
Now, to see if the intended fix actually solves the problem...

No.

Stepping through that test manually,
(and what I should have done in the first place)
I see this:

    openbsd$ e_acute=$(printf '\303\251')
    openbsd$ echo "$e_acute" > in || framework_failure_
    openbsd$ LC_ALL=en_US.UTF-8
    -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
    openbsd$ export LC_ALL

So the real problems lies elsewhere.
You could argue that the require_en_utf8_locale_
function, run just prior, should have detected that the
desired locale is not available.

However, it runs a little helper program like this:

    openbsd$ tests/get-mb-cur-max en_US.UTF-8
    4

which, since it prints a number in [3-6], does suggest that
the locale exists.

Debugging it, the first symptom I found is that
grep's DFA transition table is different on *BSD systems.
I first noticed in dfa.c's dfaexec:

    # With s=0 and *p=195 (aka \303)
    (gdb-openbsd) p trans[s][*p]
    $1 = 3

On other systems, that value is 0.

Why the difference?
To answer that, you have to peer into build_state_zero->build_state->dfastate,
which initializes like this:

/* Return non-zero if C is a `word-constituent' byte; zero otherwise.  */
#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
...

  if (! initialized)
    {
      initialized = 1;
      for (i = 0; i < NOTCHAR; ++i)
        if (IS_WORD_CONSTITUENT(i))
          setbit(i, letters);
      setbit(eolbyte, newline);
    }

The *BSD locale tables (i.e., what isalnum, isalpha, etc. use) are different.
They classify \303 (among others) as alphabetic, while other systems do not.

For the record, here are the <byte,boolean> isalpha pairs on OpenBSD 4.9.
Note how there are many '1's after 127.  There are none with glibc.

  0 0     32 0     64 0     96 0    128 0    160 0    192 1    224 1
  1 0     33 0     65 1     97 1    129 0    161 0    193 1    225 1
  2 0     34 0     66 1     98 1    130 0    162 0    194 1    226 1
  3 0     35 0     67 1     99 1    131 0    163 0    195 1    227 1
  4 0     36 0     68 1    100 1    132 0    164 0    196 1    228 1
  5 0     37 0     69 1    101 1    133 0    165 0    197 1    229 1
  6 0     38 0     70 1    102 1    134 0    166 0    198 1    230 1
  7 0     39 0     71 1    103 1    135 0    167 0    199 1    231 1
  8 0     40 0     72 1    104 1    136 0    168 0    200 1    232 1
  9 0     41 0     73 1    105 1    137 0    169 0    201 1    233 1
 10 0     42 0     74 1    106 1    138 0    170 1    202 1    234 1
 11 0     43 0     75 1    107 1    139 0    171 0    203 1    235 1
 12 0     44 0     76 1    108 1    140 0    172 0    204 1    236 1
 13 0     45 0     77 1    109 1    141 0    173 0    205 1    237 1
 14 0     46 0     78 1    110 1    142 0    174 0    206 1    238 1
 15 0     47 0     79 1    111 1    143 0    175 0    207 1    239 1
 16 0     48 0     80 1    112 1    144 0    176 0    208 1    240 1
 17 0     49 0     81 1    113 1    145 0    177 0    209 1    241 1
 18 0     50 0     82 1    114 1    146 0    178 0    210 1    242 1
 19 0     51 0     83 1    115 1    147 0    179 0    211 1    243 1
 20 0     52 0     84 1    116 1    148 0    180 0    212 1    244 1
 21 0     53 0     85 1    117 1    149 0    181 1    213 1    245 1
 22 0     54 0     86 1    118 1    150 0    182 0    214 1    246 1
 23 0     55 0     87 1    119 1    151 0    183 0    215 0    247 0
 24 0     56 0     88 1    120 1    152 0    184 0    216 1    248 1
 25 0     57 0     89 1    121 1    153 0    185 0    217 1    249 1
 26 0     58 0     90 1    122 1    154 0    186 1    218 1    250 1
 27 0     59 0     91 0    123 0    155 0    187 0    219 1    251 1
 28 0     60 0     92 0    124 0    156 0    188 0    220 1    252 1
 29 0     61 0     93 0    125 0    157 0    189 0    221 1    253 1
 30 0     62 0     94 0    126 0    158 0    190 0    222 1    254 1
 31 0     63 0     95 0    127 0    159 0    191 0    223 1    255 1



reply via email to

[Prev in Thread] Current Thread [Next in Thread]