[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: announcing thaiword.el?

From: Kenichi Handa
Subject: Re: announcing thaiword.el?
Date: Tue, 29 Mar 2005 18:02:51 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Miles Bader <address@hidden> writes:

> On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <address@hidden> 
> wrote:
>>  To handle the regular expression "\\b" and "\\B" correctly
>>  for Thai, we need a bigger change in regex.c.  For the
>>  moment, I have no idea how to do that.

> Current extensions to "word syntax", using `word-separating-categories'
> etc., seem to do the correct thing with regexps.[*]  Perhaps some
> extension to that mechanism would work.

> For instance, what if entries in `word-separating-categories' could have an
> optional predicate function -- in addition to the current (CAT1 . CAT2)
> format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
> match if PREDICATE-FUN fun (with some apropriate args) also returns true?

The problem is that the innermost function
re_match_2_internal doesn't know about the original buffer
or Lisp string.  So, to make PREDICATE-FUN work, we must
generate a Lisp string each time and that will be extemely
slow.  And first of all, is re_match_2_internal a safe place
to call a Lisp function?

> [*] I was surprised that this is true, and I don't understand why from
>     my quick look at regex.c :-/ ... But my simple tests seem to show
>     that it does really work.  E.g., I can add '(?C . ?C) to
>     `word-separating-categories', and then a regexp search will suddenly
>     start considering every single kanji character as a standalone word.

I spent fairy long time to make it work. :-p
re_match_2_internal calls the macro WORD_BOUNDARY_P at
proper places.  It is also used in scan_words (syntax.c).

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]