|
From: | Juergen Sauermann |
Subject: | Re: [Bug-apl] Regex support |
Date: | Tue, 10 Oct 2017 19:29:36 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
Hi Peter, the current syntax is A ⎕RE [X] B where A is the matching RE, B is the subject (sthe string being matched) and X is matching flags. I never liked it when programs lumped these strings together into a single string (or argument). What hasn't been addressed yet is substitution as opposed to matching. I tend to believe that APL2 selective specification of some kind would be an elegant solution, but details have not yet been worked out. Best Regards, /// Jürgen On 09/29/2017 11:41 AM, Hans-Peter
Sorge wrote:
Hi Jürgen, The construct regex ⎕Regex string looks OK to me. However having the following regex patterns match: 'regexm' ['modifier'] ⎕Regex string and substitute: 'regexs' 'regexr' ['modifier'] ⎕Regex string the patterns 'regexm' 'modifier' ⎕Regex string and 'regexs' 'regexr' ⎕Regex string are contradictory. Either 'm' 'regexm' ['modifier'] ⎕Regex string and 's' 'regexs' 'regexr' ['modifier'] ⎕Regex string or 'regexm' '' ⎕Regex string and 'regexs' 'regexr' '' ⎕Regex string would solve this syntactical problem. But typing is a bit tedious. So I would rather go with regex =^= 'm/.../mod' and 's/..../..../mod' which makes expressions like (⊂'s/..../..../mod') ⎕Regex ¨ string string string easier to read. (⊂'m/..../mod') ⎕Regex ¨ string string string should return 1 for match and 0 for non match to be used in a subsequent scan. ...... (⊂'m/..../mod') ⎕Regexi ¨ string string string could return the indexes as vector of vectors using selective specification: (matching_index non_matching_index) ← ....... ....... (⊂'m/..../mod') ⎕Regexc ¨ string string string should return the content as vector of vectors using selective specification: (matching_content non_matching_content) ← ....... and further: dates ← '2017-01-02' '2017-01-03' (⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates results in ('2017' '01' '02') ('2017' '01' '03') and dates ← ⊃ '2017-01-02' '2017-01-03' 's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates results in '2017' '01' '02' '2017' '01' '03' My be I prefer ⎕Regex['i'] over ⎕Regexi ->> ⎕Regex['option' 'option'] to handle various transform alternatives from regex results to apl. FWIIW Hans-Peter Sorge Am 22.09.2017 um 23:55 schrieb Peter Teeson:Hi Jürgen: Thanks for your usual gracious reply. I understand the points you present. Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language. IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes are really separate, but necessary, components. You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter. Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals. As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all). Separating the interpreter from the rest allows for different “models” - OS’s. What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box - desirable as that might be. I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve. FWIW respect…. PeterOn Sep 22, 2017, at 11:48 AM, Juergen Sauermann <address@hidden> wrote: Hi Peter, I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement certain things. I haven't really found a way out of this dilemma. My current thinking is this: 1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL. 2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the real world. I am equally concerned about a proliferation of quad functions (and most other APLs are more keen than GNU APL to move in that direction). However, regular expressions are a more fundamental concept than other "nice to have but never used" features, so that adding them as a ⎕-function should not do too much harm. Nobody is forced to use a ⎕-function that he or she does not know or like. And the only thing that gets more complicated when a ⎕ function is added is the implementation and not the language. Rational number, BTW, have to be explicitly ./configured and are not present in the default GNU APL. Same for parallel APL. I have seen that some users are experimenting with these features and I believe we should allow that because chances are that these experiments result in something valuable some day. Who knows? Best Regards, /// Jürgen On 09/21/2017 04:19 AM, Peter Teeson wrote:It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this. Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s and when someone suggested some form of addition to the language he would usually ask why we could not do it with an APL function. (These days performance can hardly be a compelling argument with multiple many-core CPU chips.) Right now we already have a proliferation of Quad functions not to mention lambdas and native functions. We also have divergent APLs such as Dyalog (good as it is) and so on. Complex numbers, rationals and file systems are good additions. But IMHO we should have one simple mechanism - i.e. the libapl APL API and all the rest go through that as native functions. Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions. We have already wondered away from that. Pity. When will it stop? Just my 02¢ respect PeterOn Sep 20, 2017, at 4:30 PM, address@hidden <mailto:address@hidden> wrote: <mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble> using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return? On Wed, 20 Sep 2017 21:47:29 +0200 Juergen Sauermann <address@hidden> <mailto:address@hidden> wrote:Hi Elias, I am generally in favour of supporting regular expressions in GNU APL. We should do that in a way that is compatible with the way in which the most commonly used libraries do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not only because of their regexes). I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings. That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as 'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'. Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a third argument. Everybody else, please feel invited to join the discussion. Best Regards, Jürgen Sauermann On 09/20/2017 05:59 AM, Elias Mårtenson wrote: On several occasions, I have felt that built-in regex support in GNU APL would be very helpful. Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible. I was thinking of the following form: regex ⎕Regex string The way I envision this to work, is to have the function return ⍬ if there is no match, or a string containing the match, if there is one: 'f..' ⎕Regex 'xzooy' ┏⊖┓ ┃0┃ ┗━┛ 'f..' ⎕Regex 'xfooy' 'foo' If the regex has subexpressions, those matches should be returned as individual strings: '([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02' ┏→━━━━━━━━━━━━━━━┓ ┃"2017" "01" "02"┃ ┗∊━━━━━━━━━━━━━━━┛ This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html> What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's? Regards, Elias |
[Prev in Thread] | Current Thread | [Next in Thread] |