octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp strangeness


From: Andrew Janke
Subject: Re: regexp strangeness
Date: Sat, 8 Feb 2020 13:07:25 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.4.2


On 2/8/20 4:12 AM, Daniel J Sebald wrote:
> On 2/8/20 3:32 AM, Kay Nick wrote:
>> Hey all,
>>
>> the documentation to regexp says:
>>
>> '\w'
>>            Match any word character
>>
>> what exactly is a word character (maybe even more important what isn't)?
>> Am I right in assuming its
>> [abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]? What about non
>> english characters like öäßłńŚ?
> 
> https://en.wikipedia.org/wiki/Regular_expression#Character_classes
> 
> lists \w as the equivalent to [A-Za-z0-9_]
> 
> Probably non-english won't handle this, but maybe you could try [ä-Ś] or
> whatever makes sense for the alphabet of interest.


I believe you can use Unicode character classes to handle this. For
example, '\p{L}' will match any Unicode letter in any script, including
non-English. Works for me in Octave 5.1.0.

https://www.regular-expressions.info/unicode.html

octave:3> regexp('f1o2oüö', '\p{L}')
ans =
   1   3   5   6   8

Cheers,
Andrew



reply via email to

[Prev in Thread] Current Thread [Next in Thread]