bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Regex support


From: Elias Mårtenson
Subject: Re: [Bug-apl] Regex support
Date: Wed, 20 Sep 2017 18:40:14 +0800

Regardless whether things like casing should be supported, the problem is that full Unicode support is required. APL is one of those languages where you just can't get away with not supporting it. PCRE does support it, and unfortunately I don't think POSIX regexp does.

Are there any any alternatives?

Regards,
Elias

On 20 September 2017 at 18:27, Giuseppe Cocomazzi <address@hidden> wrote:
Hi,
I also think that adding the support would be very useful. However, I
would definitely avoid PCRE and backreference support. I think the
best solution would be to just add a basic and efficient NFA-based
implementation (the defacto original implementation for Unix). For
more information about the correct way to implement RE:
https://swtch.com/~rsc/regexp/

As for the API itself, I agree with Elias that maybe a simple
interface is the way to go. I would also prefer not to have any
support for modifiers (not even IGNORECASE) and definitely avoid the
MULTILINE horror. If we opt for the NFA implementation then, the
builtin ⎕Regex (or ⎕RE) could be universally used not only for strings
but for numeric data as well. That, in conjuction with APL arrays,
would ultimately be a killer feature (I am not aware of such a feature
in other languages).

Best,

Giuseppe Cocomazzi
http://sbudella.altervista.org


On Wed, Sep 20, 2017 at 5:59 AM, Elias Mårtenson <address@hidden> wrote:
> On several occasions, I have felt that built-in regex support in GNU APL
> would be very helpful.
>
> Implementing it should be rather simple, but I'd like to discuss how such an
> API should look in order for it to be as useful as possible.
>
> I was thinking of the following form:
>
>       regex ⎕Regex string
>
> The way I envision this to work, is to have the function return ⍬ if there
> is no match, or a string containing the match, if there is one:
>
>       'f..' ⎕Regex 'xzooy'
> ┏⊖┓
> ┃0┃
> ┗━┛
>       'f..' ⎕Regex 'xfooy'
> 'foo'
>
> If the regex has subexpressions, those matches should be returned as
> individual strings:
>
>       '([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
> ┏→━━━━━━━━━━━━━━━┓
> ┃"2017" "01" "02"┃
> ┗∊━━━━━━━━━━━━━━━┛
>
> This would be a very useful API, and reasonably easy to implement by simply
> calling into the standard regcomp() call:
> http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
>
> What do you think? Is this a reasonable way to implement it? Any suggestions
> about alternative API's?
>
> Regards,
> Elias


reply via email to

[Prev in Thread] Current Thread [Next in Thread]