bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed 4.2 warnings due to poor ctype usage


From: Eric Blake
Subject: sed 4.2 warnings due to poor ctype usage
Date: Mon, 11 May 2009 18:14:36 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Sed 4.2 triggers some compilation warnings on cygwin, due to its use of passing 
plain char arguments to ctype macros - these warnings exist as a QoI issue to 
detect potential bugs on platforms where char is signed, but negative indices 
are not handled, as well as the issue where isspace((char)0xff) and isspace
((unsigned char)0xff) give different results in some locales.  For an example 
of the warnings while compiling sed:

compile.c: In function `setup_replacement':
compile.c:807: warning: subscript has type `char'
compile.c: In function `normalize_text':
compile.c:1519: warning: subscript has type `char'

It turns out that the two usages above actually go through ISDIGIT, which in 
turn is currently defined as (ISASCII(c) && isdigit(c)); so it depends on 
whether STDC_HEADERS was defined (which determines whether ISASCII is 1 or a 
call to isascii()) as to whether this ends up using isdigit on a negative value 
(undefined behavior) or whether the compilation warning can be safely ignored 
(since 7-bit char values never promote to a negative argument to isdigit).  
Then there is the matter that POSIX states that the use of isascii is obsolete, 
in part because it ignores locale issues, and in part because it is broken for 
EBCDIC encodings; and getting rid of compilation warnings seems like it would 
be worthwhile.

You really should consider using the gnulib module c-ctype (which does not 
suffer from problems of passing a plain char through the macros) if you don't 
care about locale, and/or getting rid of the use of isascii.  For that matter, 
coreutils still has ISDIGIT for speed and to cater to non-POSIX platforms, but 
defers to locale macros for all other categories or uses c-ctype when locale is 
undesired (for example, uses of ISSPACE should probably be isspace or c_isspace 
instead).  The coreutils definition is:

/* ISDIGIT differs from isdigit, as follows:
   - Its arg may be any int or unsigned int; it need not be an unsigned char
     or EOF.
   - It's typically faster.
   POSIX says that only '0' through '9' are digits.  Prefer ISDIGIT to
   isdigit unless it's important to use the locale's definition
   of `digit' even when the host does not conform to POSIX.  */
#define ISDIGIT(c) ((unsigned int) (c) - '0' <= 9)

-- 
Eric Blake






reply via email to

[Prev in Thread] Current Thread [Next in Thread]