bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] regex: Pass the system regex if its only problem is 32-bit r


From: Eric Blake
Subject: Re: [PATCH] regex: Pass the system regex if its only problem is 32-bit regoff_t
Date: Thu, 09 Sep 2010 09:04:36 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.2

On 09/09/2010 02:18 AM, Paolo Bonzini wrote:
The included regex cannot support equivalence classes and multibyte
collation symbols properly.  On the other hand it supports 64-bit
regoff_t, which glibc cannot provide without breaking the ABI.
We currently favor the latter, but this is no longer correct since
there's clearly no hope of ever passing the test.

Hmm - here's the current POSIX 2008 wording:

http://www.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html#tag_13_38

The <regex.h> header shall define the regoff_t type as a signed integer type that can hold the largest value that can be stored in either a ptrdiff_t type or a ssize_t type.


That text is changed from POSIX 2001, where it was off_t instead of ptrdiff_t (breaking even more 32-bit systems) based on ERN 60 as submitted by Paul:

http://www.opengroup.org/austin/aardvark/latest/xbdbug2.txt

====%<====
OBJECTION Enhancement Request Number 60 eggert:cs.ucla.edu Defect in XBD regoff_t (rdvk# 1) {20050825a} Thu, 25 Aug 2005 23:52:10 +0100 (BST)

_____________________________________________________________________________
Accept_X___ Accept as marked below_____ Duplicate_____ Reject_____
 Rationale for rejected or partial changes:


Add to SD5


_____________________________________________________________________________
 Page: 296  Line: 10529  Section: regoff_t


 Problem:

 Edition of Specification (Year): 2004

 Defect code :  1. Error

 POSIX currently requires regoff_t to be at least as wide as off_t, to
 facilitate "future extensions" in which strings are taken from files
 rather than from memory.  These "future extensions" were anticipated
 in 1992, but they have not seen widespread use and are not
 standardized.

 The off_t<=regoff_t requirement might cause a programmer or
 implementer to naively assume that regoff_t must be at least as wide
 as off_t.  In practice, though, this isn't true on many platforms.
 For example, on Solaris 10 (32-bit SPARC, in large-file mode),
 regoff_t is a signed 32-bit integer and off_t is a signed 64-bit
 integer.

 Now, 32-bit Solaris 10 regex.h still conforms to POSIX, so long as you
 don't compile in large-file mode.  But a wide variety of programs
 use large-file mode and it seems inappropriate for large-file mode to
 fail to conform to POSIX.

 Since the "future extensions" have never materialized, I propose that
 the off_t<=regoff_t requirement be dropped from POSIX.

 However, it does make sense to require that regoff_t be at least as
 wide as ptrdiff_t, so ptrdiff_t can be substituted for off_t.


 Action:

 Change XBD page 296 lines 10529-10530 from:

   The type regoff_t shall be defined as a signed integer type that can
   hold the largest value that can be stored in either a type off_t or
   type ssize_t.

 to:

   The type regoff_t shall be defined as a signed integer type that can
   hold the largest value that can be stored in either a type ptrdiff_t
   or a type ssize_t.


 Change XSI page 1222 lines 38367-38375 from:

   The substrings reported in pmatch[] are defined using offsets from
   the start of the string rather than pointers. Since this is a new
   interface, there should be no impact on historical implementations
   or applications, and offsets should be just as easy to use as
   pointers. The change to offsets was made to facilitate future
   extensions in which the string to be searched is presented to
   regexec() in blocks, allowing a string to be searched that is not
   all in memory at once.

   The type regoff_t is used for the elements of pmatch[] to ensure
   that the application can represent either the largest possible array
   in memory (important for an application conforming to the Shell and
   Utilities volume of IEEE Std 1003.1-2001) or the largest possible
   file (important for an application using the extension where a file
   is searched in chunks).

 to:

   The substrings reported in pmatch[] are defined using offsets from
   the start of the string rather than pointers. This allows type-safe
   access to both constant and non-constant strings.

   The type regoff_t is used for the elements of pmatch[] to ensure
   that the application can represent large arrays in memory (important
   for an application conforming to the Shell and Utilities volume of
   IEEE Std 1003.1-2001).

   The 1992 edition of this standard required regoff_t to be at least
   as wide as off_t, to facilitate future extensions in which the
   string to be searched is taken from a file.  However, these future
   extensions have not appeared.  The requirement rules out popular
   implementations with 32-bit regoff_t and 64-bit off_t, so it has
   been withdrawn.
====%<====


But you are right that on x86_64 glibc, we have:
(gdb) p sizeof(regoff_t)
$1 = 4
(gdb) p sizeof(off_t)
$2 = 8
(gdb) p sizeof(ssize_t)
$3 = 8
(gdb) p sizeof(ptrdiff_t)
$4 = 8

Should we go back to the Austin Group to further relax the requirements on regoff_t to only be at least as large as int, with EOVERFLOW errors mandated as appropriate?

+2010-09-10  Paolo Bonzini<address@hidden>
+
+       regex: Pass the system regex if its only problem is 32-bit regoff_t.
+       * m4/regex.m4: Disable test for regoff_t size.

While we wait on the Austin Group, this patch seems perfectly acceptable to me.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]