[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid

From: Jim Meyering
Subject: Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid
Date: Wed, 03 Feb 2010 17:49:13 +0100

Eric Blake wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>> It adds a test to gl_REGEX that ensures that re_compiler_pattern
>> diagnoses [b-a] as invalid when using RE_SYNTAX_POSIX_EGREP.
> Where does POSIX state that this is invalid?

Thanks for looking.

I too verified (before embarking) that POSIX does not declare it invalid,
merely unspecified. However, since gnulib's regex has rejected such
ranges for a long time and sed, awk, perl, etc. act that way, I think
it's the way to go.

Note also that glibc's code appears to try to implement the same
behavior (though conditional upon RE_NO_EMPTY_RANGES, which nearly
everyone uses), but somehow that code does not function properly:

      start_collseq = lookup_collation_sequence_value (start_elem);
      end_collseq = lookup_collation_sequence_value (end_elem);
      /* Check start/end collation sequence values.  */
      if (BE (start_collseq == UINT_MAX || end_collseq == UINT_MAX, 0))
        return REG_ECOLLATE;
      if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_collseq > end_collseq, 0))
        return REG_ERANGE;

I've just filed this glibc bug:


> So far, I can only see that it is
> undefined, but have not found any hard requirements that it be a failure.
> http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
> 9.3.5 RE Bracket Expression, step 7: "The starting range point and the ending
> range point shall be a collating element or collating symbol.... If the
> represented set of collating elements is empty, it is unspecified whether the
> expression matches nothing, or is treated as invalid."
> That said, forcing a hard failure is probably the best QoI implementation of
> undefined behavior.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]