bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Suggestion for Quad-RE


From: Juergen Sauermann
Subject: Re: [Bug-apl] Suggestion for Quad-RE
Date: Wed, 11 Oct 2017 14:40:20 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

Hi Xtian,

in general we should only add flags if they are frequently used and if there is no appropriate solution in APL.
What we should avoid, IMHO, is redundancy by introducing new functionality inside ⎕RE that
only increases the internal complexity for the sake of saving a small number of characters at APL
level. I believe that for the average APL programmer it is easier to create or decode a slightly more complex
APL _expression_ that to remember a zoo of flags for different use cases.

I truly believe that GNU APL should aim at being minimalistic, but not in terms of the number of APL characters
typed, but in terms of concepts deployed.

Coming back to your example, there is a difference between a RE match and a line containing some RE match.
If you rewrite the RE in your example (using a fixed string z instead of a long /var/log/messages):

      z←'foo' 'bar' 'Started at 11:22' 'something' 'else' '*** Stopped  12:33'
      4 ⎕CR ⊃((⊂⍬)≢¨'.*Started.*|.*Stopped.*' ⎕RE[''] z)/z
┏→━━━━━━━━━━━━━━━━━┓
↓Started at 11:22  ┃
┃*** Stopped  12:33┃
┗━━━━━━━━━━━━━━━━━━┛

then you get the matching lines without returning large integer vectors from RE.

An even faster approach would be to somehow use ⎕FIO[26] instead of
⎕FIO[49] and do the line splitting
inside ⎕RE rather than inside ⎕FIO.
 
I was actually thinking of an output flag that can be used directly with (not !) or []
for cases where the result
of a match is fairly sparse.

Regarding performance, I would argue that you cannot gain a lot by optimizing the output format because you
always have to process the matched string and the production of large APL values (as opposed to many small
ones) is fairly efficient. And the possible overhead of additional APL functions does not count if the results are large.

Best Regards,
/// Jürgen


On 10/11/2017 05:12 AM, Christian Robert wrote:
Sometimes we only want to know if it match or not.

I suggest a new flag ['m']  (as match) that will return ...

  for a string:  either 0 or 1 as a scalar for "not matching" or "matching"
  for an array of strings: a vector of 0/1 for each string saying like above.


lets say:

      z←⎕fio[49] '/var/log/messages'  // beware that this file is inaccessible by default unless being "root" on linux
                                      // or you chmod a+r /var/log/messages  # as root

who may return 50,000 lines or even 2 millions, on an average of say ~120 characters each.


I would hope to be able to use a flag as ['m']:

     'Started|Stopped' ⎕RE['m'] z

who will return an array of (0/1) telling which lines match or not the pattern, so I can
only retain those matching for further fine tuning (via diadic operator "/").

It will be a LOT faster than letting ⎕RE returning the whole result of pcre2 INTO the physical Gnu-APL memory engine
creating a lot of integers arrays for no real purpose, ie: seen from the application.

comments welcome,

my usual 2 cents,
Xtian.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]