poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression


From: Jose E. Marchesi
Subject: Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
Date: Mon, 20 Feb 2023 14:43:47 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

> Hi Jose.
>
> On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
>> 
>> >
>> > What about having a new compile-time type for matched entities.
>> > Both useful in regular expression matching for strings and array of
>> > characters.
>> >
>> > Something like this:
>> >
>> > ```poke
>> > var m1 = "Hello pokers!" ~ /[hH]ello/,
>> >     m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
>> >
>> > if (m)
>> >   {
>> >     printf "matched at index %v and offset %v\n", m.index_begin, 
>> > m.offset_begin;
>> >     assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
>> >   }
>> > else
>> >   {
>> >     assert (m.index_begin ?! E_elem);
>> >     assert (m.offset_begin ?! E_elem);
>> >   }
>> > ```
>> >
>> > We can use other fields for the giving the access to sub-groups.
>> >
>> > We can take an approach similar to `Exception` struct.  But for `Matched`.
>> > Compiler can cast it to boolean when necessary.
>> 
>> The idea is interesting.  But I don't like the part of changing the
>> semantics of `if' like this: it is not orthogonal.
>> 
>> Note that the syntactic construction that uses Exception only works with
>> exceptions:
>> 
>>   try STMT; catch if EXCEPTION { ... }
>> 
>> If we could come with a syntactic construction for regular expression
>> matching, then it would be better IMO.
>> 
>> 
>
>
> What about this syntax:
>
> ```poke
> var matched_p = "Hello pokers!" ~? /[hH]ello/,
>     matchinfo = "Hello pokers!" ~ /[hH]ello/;
>
> assert (matched_p isa int<32>);
> assert (matchinfo isa Matched);
>
> if (matchinfo.matched_p) { ... }
> ```

Hmm... that has the disadvantage of having to match twice.

It seems to me, we could make use of the exceptions by having ~ return a
Match struct and raising an E_nomatch exception when there is no match.

Then we can use the normal operators ?! and try-until and try-catch to
check for when there is no match.

>
> Now let's talk about regexp searching!
>
> ```poke
> var sr10 = /[nN]eedle/ $ "... needle in a haystack ...",
>     sr11 = /[nN]eedle/ $ (byte[] @ 10#B);
> ```
>
> We can also translate struct patterns like `{ S | a == 0, b < 0, c == 15 }`
> to a regexp pattern and a bunch of constraints.  Consider:
>
> ```poke
> set_endian (ENDIAN_LITTLE);
>
> type S = struct
>   {
>     int<8> a;
>     int<32> b;
>     int<9> c;
>   };
>
> var search_results = { S | a == 0, b < 0, c == 15 } $ (byte[] @ 0#B),
>     sr2 = { S | a == 0, b < 0, c == 15 }
>               $ [0xaaUB, 0x55UB, 
> 0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB];
>
> // [0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB] is the encoding of
> // S {a=0UB, b=-1, c=15}.
> ```
>
> That can be translated to something like this:
>
> ```poke
>
> var search_results = (byte[] bytes) lambda SearchResult:
>   {
>     var tmp = open (*__somehting*"),
>         res = SearchResult {};
>
>     try {
>       var s = /\x00\(....\)\x0f\x00/ $ bytes,
>           sub = s.subgroups;
>
>       // sub[0] is the whole match.
>
>       var b = sub[1].offset_begin,
>           e = sub[1].offset_end;
>
>       byte[e-b] @ tmp : 0#B = bytes[b:e];
>       if ((int<32> @ tmp : 0#B) < 0)
>       {
>         // Found! fill in the `res` ...
>       }
>     } catch (Exception ex) {
>       close (tmp);
>       raise ex;
>     }
>
>     close (tmp);
>     return m;
>   } (byte[] @ 0#B);
> ```
>
>
> I guess using a regexp library may improve the searching performance.
> This just came to my mind.  We can discuss more :)
>
>
> Regards,
> Mohammad-Reza



reply via email to

[Prev in Thread] Current Thread [Next in Thread]