I made the changes needed to use UTF-32 instead. It turned out that the PCRE version 1 API I was using does not properly support UTF-32 patterns (only match data). Thus, I changed the code to use version 2 instead.
I have attached the two files that I changed. It works, as can be seen in the below example, but it's nowhere near complete.
'(..)..(..)⍱$' ⎕RE "footesting⌽⍱"
┏→━━━━━━━━┓
┃"st" "g⌽"┃
┗∊━━━━━━━━┛
Now, there are two changes I would like to see:
- If the right-hand argument is an array of strings, the pattern should be applied to all strings, collecting the results into a 2D array. This will be quite efficient, since the pattern only needs to be compiled once.
- I'd like an axis-argument with options. One of those options should be a flag that causes a mismatch to yield an error instead of ⍬. This would be useful when the regex check is used to extract data out of data which is expected to follow a given pattern (think one-liners in interactive mode).
The reason I haven't implemented these myself is because I find the current code to be absolutely awful, especially with all the duplicated code to deallocate PCRE structures. In Lisp I'd use an UNWIND-PROTECT (or try/finally in Java), but in C++ I think I have to declare a new class with a destructor to handle this, correct? Is there anyone who would like to clean this up?
Regards,
Elias