Re: [Bug-apl] ⎕RE merged

One more bug:

The call to pcre2_compile_32 should be changed from:

code = pcre2_compile_32(pattern_ucs, pattern.size(),

PCRE2_NO_UTF_CHECK | flags, &error_code,

&error_offset, 0);

To:

code = pcre2_compile_32(pattern_ucs, pattern.size(),

PCRE2_UTF | PCRE2_UCP | flags, &error_code,

&error_offset, 0);

Without PCRE2_UTF, proper Unicode semantics will not be applied (such as properly handling case matching for non-ASCII characters).

PCRE2_UCP, is a little less obvious. I think it would make sense to enable it, since we care more for correctness than performance. Here's what the documentation has to say about it:

“This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to classify characters. More details are given in the section on generic character types in the pcre2pattern page. If you set PCRE2_UCP, matching one of the items it affects takes much longer.”

Finally, I don't think it makes sense to use PCRE2_NO_UTF_CHECK since at best it's a no-op (since we're using UTF-32) and at worst it can cause a crash when trying to match an invalid string. That's not worth what little performance benefit there is to gain from it.

Regards,

Elias

On 9 October 2017 at 11:12, Elias Mårtenson <address@hidden> wrote:

I found another bug. ↓ is used to indicate that string indexes are requested, but the error message when multiple output types are requested is wrong:

"foo" ⎕RE["⊂↓"] "bar"

DOMAIN ERROR+

'foo' ⎕RE['⊂↓']'bar'

^ ^

)more

Multiple ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/

Note the ⍳ in the error message instead of ↓.

Regards,

Elias

On 9 October 2017 at 10:45, Elias Mårtenson <address@hidden> wrote:

I fixed the problem by adding a static_cast<PCRE2_SIZE>(len), but I found another issue: The testcases file is missing.

Regards,

Elias

On 9 October 2017 at 10:41, Elias Mårtenson <address@hidden> wrote:

Thank you.

There are some errors when compiling on my Arch system:

g++ -DHAVE_CONFIG_H -I. -I.. -Wall -I sql -Wold-style-cast -Werror -I/usr/include -I/usr/include -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo './'`Quad_RE.cc

Quad_RE.cc: In static member function ‘static Value_P Quad_RE::partition_result(const Regexp&, const Quad_RE::Flags&, const UCS_string&)’:

Quad_RE.cc:211:42: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]

for (ShapeItem match_id = 1; B_offset < len; match_id += match_id_inc)

~~~~~~~~~^~~~~

cc1plus: all warnings being treated as errors

make[3]: *** [Makefile:2725: apl-Quad_RE.o] Error 1

make[3]: Leaving directory '/home/emartenson/src/apl/src'

make[2]: *** [Makefile:3333: all-recursive] Error 1

make[2]: Leaving directory '/home/emartenson/src/apl/src'

make[1]: *** [Makefile:514: all-recursive] Error 1

make[1]: Leaving directory '/home/emartenson/src/apl'

make: *** [Makefile:401: all] Error 2

Regards,

Elias

On 9 October 2017 at 00:47, Juergen Sauermann <address@hidden> wrote:

Hi,

I have merged Elias' ⎕RE implementation into GNU APL.
Thanks, Elias, for contributing it. See 'info apl' for a description
and src/testcases/Quad_RE.tc for examples of how to use ⎕RE.

SVN 1012.

Enjoy,
/// Jürgen

From:	Juergen Sauermann
Subject:	Re: [Bug-apl] ⎕RE merged
Date:	Tue, 10 Oct 2017 18:34:00 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0