[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [coreutils] tr: case mapping anomaly
From: |
Jim Meyering |
Subject: |
Re: [coreutils] tr: case mapping anomaly |
Date: |
Wed, 29 Sep 2010 08:45:14 +0200 |
Pádraig Brady wrote:
> On 25/09/10 07:53, Jim Meyering wrote:
>> Eric Blake wrote:
>>> On 09/24/2010 04:47 PM, Pádraig Brady wrote:
>>>> I was just looking at a bug reported to fedora there where this abort()s
>>>>
>>>> $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]'
>>
>> Ouch! Thanks for reporting it here.
>> How many more bugs lurk in tr...
>> Consolation: this one is a failure to diagnose invalid inputs.
>
> I found a few more issues:
Nice catches. Thanks for all the work.
Please mention the above abort-inducing case in the log along with the
other three. While you're at it, please use LC_ALL= there, not LANG=,
since the latter is a glibc-ism.
> This valid translation spec aborted:
> LANG=en_US tr '[:upper:]- ' '[:lower:]_'
The above does not abort w/glibc when LC_ALL=C happens to be set.
> This misaligned conversion spec was allowed:
> LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]'
> This misaligned spec was allowed by extending the class:
> LANG=C tr '[:upper:] ' '[:lower:]'
>
...
> + tr: fix various issues with case conversion classes. In some locales
> + valid conversion specifications caused tr to abort, while some invalid
> + specifications were undiagnosed.
> + [bugs introduced in coreutils 6.9.90 and 6.9.92]
Two separate bugs! Thanks for tracking them down.
I'm surprised they are so recent.
I suppose that means I introduced both. Let's see...
These two seem like the only candidates:
6efd10462d8103208f4575f0b5edddf841c7d87c
af5d0c363a52e787a4311a11f035209eecdc4115
Whichever they are, please list them in the commit log.
> ** New features
>
> cp now accepts the --attributes-only option to not copy file data,
> diff --git a/src/tr.c b/src/tr.c
> index a5b6810..3adc85f 100644
> --- a/src/tr.c
> +++ b/src/tr.c
> @@ -1177,6 +1177,77 @@ card_of_complement (struct Spec_list *s)
> return cardinality;
> }
>
> +/* Discard the lengths associated with a case conversion,
> + as using the actual number of upper or lower case characters
> + is problematic when they don't match in some locales.
> + Also ensure the case conversion classes in string2 are
> + aligned correctly with those in string1.
> + Note POSIX says the behavior of `tr "[:upper:]" "[:upper:]"'
> + is undefined. Therefore we allow it (unlike Solaris)
> + and treat it as a no-op. */
> +
> +static void
> +validate_case_classes (struct Spec_list *s1, struct Spec_list *s2)
> +{
> + size_t upper_chars = 0;
> + size_t lower_chars = 0;
Please name these n_upper and n_lower, so each is obviously a counter.
Otherwise, seeing the name out of context might evoke an array.
> + int i;
Please make this "unsigned int", since it never needs negative values.
> + int c1=0, c2=0;
These days, I prefer one per line, and with spaces:
int c1 = 0;
int c2 = 0;
Hmm.. I see you're following existing style that did "int c1, c2;",
though in existing code, those didn't have initializers.
And the "int i;" was essentially moved. And there are others.
I blame the author ;-) (me, 18 years ago)
> + count old_s1_len = s1->length;
> + count old_s2_len = s2->length;
> + struct List_element *s1_tail = s1->tail;
> + struct List_element *s2_tail = s2->tail;
> + bool s1_new_element = true;
> + bool s2_new_element = true;
...
> +
> +# Before coreutils 8.6 the disparat number of upper and lower
s/disparat/disparate/
> +# characters in some locales, triggered abort()s and invalid behavior
> +if test "$(LC_ALL=en_US.ISO-8859-1 locale charmap 2>/dev/null)" = ISO-8859-1;
> +then
> + (
No big deal, but why run this in a subshell?
It doesn't seem necessary here.
> + export LC_ALL=en_US.ISO-8859-1
> + tr '[:upper:] ' '[:lower:]' < /dev/null 2>out && _fail=1
> + echo 'tr: when translating with string1 longer than string2,
> +the latter string must not end with a character class' > exp
> + compare out exp || _fail=1
> +
> + # Ensure when there are a different number of elements
> + # in each string, we validate the case mapping correctly
> + tr 'ab[:lower:]' '0-1[:upper:]' < /dev/null || _fail=1
It might be nice to ensure that e.g., abc.xyz maps to ABC.XYZ.
> + # Ensure we extend string2 appropriately
> + tr '[:upper:]- ' '[:lower:]_' < /dev/null || _fail=1
Similarly, that ABC-XYZ maps to abc_xyz
Thanks again!
> + # Ensure the size of the case classes are accounted
> + # for as a unit.
> + echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
> + tr '[:upper:]A-B' '[:lower:]0' >out || _fail=1
> + echo '00cdefghijklmnopqrstuvwxyz' > exp
> + compare out exp || _fail=1
...
- [coreutils] tr: case mapping anomaly, Pádraig Brady, 2010/09/24
- Re: [coreutils] tr: case mapping anomaly, Eric Blake, 2010/09/24
- Re: [coreutils] tr: case mapping anomaly, Pádraig Brady, 2010/09/25
- Re: [coreutils] tr: case mapping anomaly, Jim Meyering, 2010/09/25
- Re: [coreutils] tr: case mapping anomaly, Eric Blake, 2010/09/29
- Re: [coreutils] tr: case mapping anomaly, Pádraig Brady, 2010/09/29
- Re: [coreutils] tr: case mapping anomaly, Eric Blake, 2010/09/29
- Re: [coreutils] tr: case mapping anomaly, Eric Blake, 2010/09/29