bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#1913: Identifier after reserved word "raise" is not always


From: Stephen Leake
Subject: bug#1913: Identifier after reserved word "raise" is not always
Date: Wed, 13 Jan 2010 03:03:24 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (windows-nt)

It is clear that [a-zA-Z] does not match the characters permitted by
the Ada standard.

However, neither does [[:alpha:]] - consider this fragment:

procedure doµ 

the 'µ' (entered by C-x 8 u) is not matched by [[:alpha:]]*
(Emacs 23.1, Windows XP, LANG=C.UTF-8).

This could be fixed by the user; they can define µ to have word
syntax.

Ideally, we would have regular expression character ranges that match
those defined by ISO/IEC 10646:2003 (see LRM 2.1); 

Letter, Uppercase
Letter, Lowercase
Letter, Titlecase
Letter, Modifier
Letter, Other
Mark, Non-Spacing
Mark, Spacing Combining
Number, Decimal
Number, Letter
Punctuation, Connector
Other, Format
Separator, Space
Separator, Line
Separator, Paragraph

These categories are used to define Ada lexical elements (LRM 2.2).

But I don't think that's going to happen.

It seems the best compromise is to replace a-z etc with [:alpha:] or
[:alnum:] as appropriate, and hope the user knows how to define
characters to have word syntax. That's a lot of work, since each
modified regexp needs to be tested.

As for matching leading underscores, I agree it would be nice to get
it right. Using shy groups (the elisp name for non-capturing groups)
would help, since it won't disturb the group numbering, as well as
being faster. If it doesn't complicate the testing, I'll try to do
that.

Do you have suggestions about which regular expressions are more
important to be fixed? If you can provide typical code, and point out
the most annoying font-lock failures, that would be a good start.

-- 
-- Stephe






reply via email to

[Prev in Thread] Current Thread [Next in Thread]