bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Regex support


From: Juergen Sauermann
Subject: Re: [Bug-apl] Regex support
Date: Thu, 21 Sep 2017 13:39:21 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

Hi Elias,

the UTF8_constructors look OK, but it can be tricky to properly interpret indices (the elements of sub in your code) of
UTF8-encoded strings (i.e whether they mean code points or byte offsets).

My feeling is that you should avoid UTF8_strings completely and go for the UTF32 option of the library (assuming that
UTF32 are codepoints encoded as 32 bit integers). APL character strings are almost UTF32 strings (except for gaps between
the codepoints) and they avoid all the bits shifting needed for UTF8 strings.

Best Regards,
/// Jürgen


On 09/21/2017 12:09 PM, Elias Mårtenson wrote:
I've implemented the bare minimal needed to get regexes working through a ⎕RE function. I've attached the diff.

I really need Jürgen to take a look at this, since my code that constructs the return value cannot possibly be correct. There must be a better way to handle this which does not involve conversion back and forth between std::string.

Also, I have the result in an UTF-8-encoded C string, and I try to create an UTF8_string from it like this:

    Value_P field_value(UTF8_string(field.c_str()), LOC);

However, when I test this in APL I get the following result:

      '(..)..(..)$' ⎕RE 'sdklfjfj⍉'
┏→━━━━━━━━━━┓
┃"lf" "jâ\215\211"┃
┗∊━━━━━━━━━━┛

It seems the UTF-8 conversion is not done correctly by the UTF8_string constructor. What did I do wrong?

Regards,
Elias      

On 21 September 2017 at 11:38, Xiao-Yong Jin <address@hidden> wrote:

> On Sep 20, 2017, at 9:19 PM, Peter Teeson <address@hidden> wrote:
>
> (These days performance can hardly be a compelling argument
> with multiple many-core CPU chips.)

This kind of argument for APL is exactly why Fortran is still alive and well.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]