Re: built-in regex matches wrong character

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: built-in regex matches wrong character

From:	Eric Blake
Subject:	Re: built-in regex matches wrong character
Date:	Thu, 6 Sep 2018 12:58:17 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/06/2018 12:39 PM, Aharon Robbins wrote:

In article <mailman.444.1536243821.1284.bug-bash@gnu.org>,
Eric Blake  <eblake@redhat.com> wrote:

But bash could be taught to convert any regex that contains a range with
both endpoints ASCII into a different bracket expression before handing
things over to regcomp().  That is, if the user is matching against
[a-d], bash hands [abcd] to regcomp() instead.  You don't need a flag in
regcomp() to get RRI, just merely some pre-processing (and often memory
allocation, as the expansion of a range into a non-range tends to
require more characters).


This is easy and inexpensive for ASCII only.  Full RRI does the
same thing for wide character sets as well, though, and there
the possibility for using very large amounts of memory makes the
rewrite-the-range idea less palatable.

Indeed. But the bash option is named 'globasciiranges', and I find farmore use in having ranges with both endpoints in single-byte ASCIIbehaving sanely than I do for ranges with one or more ends resulting ina multibyte character (by the time my regex involves multibytecharacters, I am already admitting that I am in locale-dependentterritory, and RRI may no longer be the best action anyway). That is,RRI makes the most sense when dealing with ASCII characters (< 128) inthe first place, and that's a reasonable stopgap for immediateimplementation, even if we don't get full RRI across all of Unicode(assuming that such might later become available via a new regcomp() flag).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

built-in regex matches wrong character, mamatb, 2018/09/05
- Re: built-in regex matches wrong character, Eric Blake, 2018/09/05
  - Re: built-in regex matches wrong character, Miguel Amat, 2018/09/05
    - Re: built-in regex matches wrong character, Chet Ramey, 2018/09/06
  - Re: built-in regex matches wrong character, Chet Ramey, 2018/09/06
    - Re: built-in regex matches wrong character, Eric Blake, 2018/09/06
    - Re: built-in regex matches wrong character, Chet Ramey, 2018/09/06
    - Message not available
    - Re: built-in regex matches wrong character, Aharon Robbins, 2018/09/06
    - Re: built-in regex matches wrong character, Eric Blake <=
- Re: built-in regex matches wrong character, Chet Ramey, 2018/09/06

Prev by Date: Re: built-in regex matches wrong character
Next by Date: The loadables are built during install
Previous by thread: Re: built-in regex matches wrong character
Next by thread: Re: built-in regex matches wrong character
Index(es):
- Date
- Thread