help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flex vs. POSIX 1003.2-1992 repeat operator {} precedence


From: Casey Leedom
Subject: Flex vs. POSIX 1003.2-1992 repeat operator {} precedence
Date: Fri, 26 Apr 2002 08:32:44 -0700

Greetings,

  Will Estes recommended that I bring an issue we're trying to address on
this email list.  We'd appreciate your thoughts and advice.

  The basic problem is that flex is non-compliant with the POSIX 1003.2-1992
Shell and Utilities specification for lex on the precedence of the repeat
operator, {}.  The flex manual page documents that lex and flex have
different precedences for the repeat operator but the manual page
incorrectly identifies flex's behavior as POSIX-conforming.  In fact it's
lex that is PSIX-conforming and flex which is non-conformant.

  This misunderstanding stems from a rather subtle note in
POSIX-1003.2-1992.  Most of the utilities which use Extended Regular
Expressions (EREs) use the precedence order defined on page 86 in table
2-14.  This precedence has the repeat operator higher than concatenation.
That is, ab{3}, is treated as a(b{3}) yielding abbb.  However, in the
definition for lex on pages 699-700 a different precedence order for lex's
EREs is defined which shows concatenation having higher precedence than the
repeat operator (this difference is explicitly called out in the standard).
Thus ab{3} is treated as (ab){3} yielding ababab.

  I'm pretty sure that Vern was shooting for POSIX compliance when he worked
on flex and just missed this unfortunately subtle notation.

  Of course it's unfortunate that lex uses a different ERE than the other
utilities but it's part of the standard now.  This is a case where the
standard committee decided to codify existing behavior (the lex
implementation) rather than changing behavior.

  So now we're stuck with a quandary: the change to make flex POSIX-
compliant is very simple and I've already implemented and tested it.  But if
we make this change flex won't be compatible with old lexers using the
repeat operator.  These are fairly rare since the repeat operator isn't used
often but still there will be some breakage.

  At SGI we're very sensitive to compatibility concerns since this is a
major part of our IRIX 6.5 story.  However, when faced with a standards
violation versus compatibility problem we've generally treated the
non-conformant behavior as a bug and fixed it.  If we felt that the change
might significantly affect our customers we've either widely advertised the
change and, if necessary, provided mechanisms for getting the old behavior.

  I believe that it would be fairly simple for me to get the old flex repeat
operator precedence via something on the order of ``flex --compat'' and some
lexer trickery.  It would probably take me a few hours to do the work and
document it.

  The questions are:

 1. Should we do anything at all about this?  We could simply correct the
    mis-statement in the manual page and note that flex and lex use
    different repeat operator precedence and lex is POSIX-conformant while
    flex is not.

 2. If we change flex's repeat operator precedence to be conformant with
    POSIX, do we want to offer any compatibility capability for the old
    non-POSIX-conformant and flex-unique repeat operator precedence?  Note
    that it's a simple matter of adding parenthesis to a lexer specification
    to get any precedence you desire.

Thoughts?  Comments?

Casey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]