Re: [Bug-apl] Regex support

bug-apl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Regex support

From:	Juergen Sauermann
Subject:	Re: [Bug-apl] Regex support
Date:	Tue, 10 Oct 2017 19:29:36 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

Hi Peter,

the current syntax is A ⎕RE [X] B where A is the matching RE, B is the subject
(sthe string being matched) and X is matching flags.

I never liked it when programs lumped these strings together into a single string (or argument).

What hasn't been addressed yet is substitution as opposed to matching. I tend to believe
that APL2 selective specification of some kind would be an elegant solution, but details
have not yet been worked out.

Best Regards,
/// Jürgen

On 09/29/2017 11:41 AM, Hans-Peter Sorge wrote:

Hi Jürgen,

The construct  regex ⎕Regex string  looks OK to me.

However having the following regex patterns

match:       'regexm' ['modifier'] ⎕Regex string  and
substitute:  'regexs' 'regexr'  ['modifier'] ⎕Regex string

the patterns
'regexm' 'modifier' ⎕Regex string and
'regexs' 'regexr'   ⎕Regex string
are contradictory.

Either
'm' 'regexm' ['modifier']  ⎕Regex string and
's' 'regexs' 'regexr'  ['modifier'] ⎕Regex string

or
'regexm' '' ⎕Regex string  and
'regexs' 'regexr'  '' ⎕Regex string
would solve this syntactical problem.  But typing is a bit tedious.


So I would rather go with regex =^= 'm/.../mod' and  's/..../..../mod'

which makes expressions like
(⊂'s/..../..../mod') ⎕Regex ¨ string string string
easier to read.

(⊂'m/..../mod') ⎕Regex ¨ string string string
should return 1 for match and 0 for non match to be used in a subsequent
scan.

...... (⊂'m/..../mod') ⎕Regexi ¨ string string string
could return the indexes as vector of vectors using selective
specification:  (matching_index  non_matching_index) ← .......

....... (⊂'m/..../mod') ⎕Regexc ¨ string string string
should return the content as vector of vectors using selective
specification:
(matching_content  non_matching_content) ← .......

and further:
dates ← '2017-01-02' '2017-01-03'
(⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates
results in
('2017' '01' '02') ('2017' '01' '03')

and
dates ← ⊃ '2017-01-02' '2017-01-03'
's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates
results in
'2017' '01' '02'
'2017' '01' '03'


My be I prefer ⎕Regex['i'] over ⎕Regexi ->>  ⎕Regex['option' 'option']
to handle various transform alternatives from regex results to apl.

FWIIW

Hans-Peter Sorge


Am 22.09.2017 um 23:55 schrieb Peter Teeson:

Hi Jürgen:
Thanks for your usual gracious reply. I understand the points you present.

Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language.
IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes
are really separate, but necessary, components.

You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals.

As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” - OS’s. 

What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box
 - desirable as that might be.

I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve.

FWIW

respect….

Peter

On Sep 22, 2017, at 11:48 AM, Juergen Sauermann <address@hidden> wrote:
Hi Peter,

I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.

My current thinking is this:

1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping
    of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU
   APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I
   believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as
   opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL.

2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the
    real world. I am equally concerned about a proliferation of quad functions (and most other APLs are more keen than GNU APL to
   move in that direction). However, regular expressions are a more fundamental concept than other "nice to have but never used"
   features, so that adding them as a ⎕-function should not do too much harm. Nobody is forced to use a ⎕-function that he or she
   does not know or like. And the only thing that gets more complicated when a ⎕ function is added is the implementation and not
   the language.

Rational number, BTW, have to be explicitly ./configured and are not present in the default GNU APL. Same for parallel APL. I have
seen that some users are experimenting with these features and I believe we should allow that because chances are that these
experiments result in something valuable some day. Who knows? 

Best Regards,
/// Jürgen


On 09/21/2017 04:19 AM, Peter Teeson wrote:

It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s 
and when someone suggested some form of addition to the language he would usually ask 
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)

Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.

Complex numbers, rationals and file systems are good additions.  
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.

Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity.  When will it stop?

Just my 02¢

respect

Peter

On Sep 20, 2017, at 4:30 PM, address@hidden <mailto:address@hidden> wrote:

<mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble>

using apl like syntax is good    aaa' ⎕REX['s'] 'bbb'      what would monadic   ⎕REX['s'] 'bbb'      return?

On Wed, 20 Sep 2017 21:47:29 +0200
Juergen Sauermann <address@hidden> <mailto:address@hidden> wrote:

Hi Elias,

I am generally in favour of supporting regular expressions in GNU APL.

We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).

I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.

Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.

Everybody else, please feel invited to join the discussion.

Best Regards,
Jürgen Sauermann

On 09/20/2017 05:59 AM, Elias Mårtenson wrote:
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.

Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.

I was thinking of the following form:

regex ⎕Regex string

The way I envision this to work, is to have the function return ⍬ if there is no match, or a string containing the match, if there is one:

'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'

If the regex has subexpressions, those matches should be returned as individual strings:

'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛

This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html>

What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?

Regards,
Elias

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-apl] Regex support, Elias Mårtenson, 2017/10/02
- Re: [Bug-apl] Regex support, Elias Mårtenson, 2017/10/02
  - Re: [Bug-apl] Regex support, Juergen Sauermann, 2017/10/02
    - Re: [Bug-apl] Regex support, Elias Mårtenson, 2017/10/03
- Re: [Bug-apl] Regex support, Juergen Sauermann <=

Prev by Date: Re: [Bug-apl] Monadic form of ↓
Next by Date: Re: [Bug-apl] Monadic form of ↓
Previous by thread: Re: [Bug-apl] Regex support
Next by thread: [Bug-apl] Regexp prototype
Index(es):
- Date
- Thread