[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Regexp capturing unicode characters
From: |
Heime |
Subject: |
Re: Regexp capturing unicode characters |
Date: |
Thu, 01 Aug 2024 17:06:26 +0000 |
On Friday, August 2nd, 2024 at 3:34 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 01 Aug 2024 13:43:20 +0000
> > From: Heime heimeborgia@protonmail.com
> > Cc: help-gnu-emacs@gnu.org
> >
> > > Why do you need that? Don't you know which characters you'd like to
> > > match?
> >
> > No, because language insertion in emacs depends upon the user. But I want
> > to match foreign language characters mostly.
>
>
> If by "foreign language characters" you mean letters and digits, then
> [:alnum:] is what you want, as I already suggested. This covers all
> the characters that are either letters or digits, in all the
> languages.
>
> > > > Is there a way to show the characters that are members of each class ?
> > >
> > > No, but you can check each character whether it matches a class.
> >
> > What is the function name for doing that ?
>
>
> string-match-p if you have a string or looking-at-p if you have it in
> the buffer.
>
> > Can one scan the buffer and list the matched character classes ?
>
>
> Character classes overlap, so I'm not sure what kind of function you
> want, and I don't think we have it anyway. It's usually the other way
> around: the author of a Lisp program knows in advance what kinds of
> characters the program needs to match, and uses a regexp which will do
> the job.
I want to include in the regexp the possibility that the user wrote some
comment in a foreign language other than english. Otherwise the regexp
would simply skip them. And your suggestion has been [alpha] and [:alnum:].
> > > > Thought that [:multibyte:] captured the unicode characters. Bet even
> > > > when
> > > > I applied (set-buffer-multibyte t) to the buffer, I did not get matches.
> > >
> > > Don't use [:multibyte:], it is hardly ever the right thing nowadays.
> >
> > Can we update the manual with useful information such as with [:multibyte:]
> > please.
>
>
> The useful information is already there (including a cross-reference
> to a detailed description of what "multibyte" means). I just
> translated it into simpler terms, based on what you told about the job
> you want to do, to save you from the need to read that if you don't
> want to.
A mention that [:multibyte:] is not used much nowadays.
> > > > Does [:word:] mean word in the english language only ?
> > >
> > > No, it means characters that have the word syntax. IOW, which
> > > character match depends on the major mode's syntax table. If you are
> > > classifying characters from human-readable text, [:word:] is not the
> > > right thing to use.
>
> > Can one show the syntax table ? For me it is just word syntax table does
> > not give me enough information. Perhaps give more explanation in the manual.
>
> The manual already does that: there's a cross-reference in the
> description of [:word:] which leads to the node "Syntax Class Table",
> which explains syntax tables in detail.
- Re: Regexp capturing unicode characters, Eli Zaretskii, 2024/08/01
- Re: Regexp capturing unicode characters, Heime, 2024/08/01
- Re: Regexp capturing unicode characters, Eli Zaretskii, 2024/08/01
- Re: Regexp capturing unicode characters, Heime, 2024/08/01
- Re: Regexp capturing unicode characters, Michael Heerdegen, 2024/08/01
- Re: Regexp capturing unicode characters, Eli Zaretskii, 2024/08/01
- Re: Regexp capturing unicode characters,
Heime <=
- Re: Regexp capturing unicode characters, Eli Zaretskii, 2024/08/01
- Regexp capturing unicode characters, Heime, 2024/08/01
- Re: Regexp capturing unicode characters, Eli Zaretskii, 2024/08/02
- Re: Regexp capturing unicode characters, uzibalqa, 2024/08/02