[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [External] : Re: Regexp for matching control character, say, FORM FE
From: |
Hongyi Zhao |
Subject: |
Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.) |
Date: |
Thu, 22 Jul 2021 17:45:36 +0800 |
On Thu, Jul 22, 2021 at 4:06 PM <tomas@tuxteam.de> wrote:
>
> On Thu, Jul 22, 2021 at 09:13:31AM +0800, Hongyi Zhao wrote:
>
> [...]
>
> > I want to know whether there are some similar regexp patterns in Emacs
> > as the ones used by grep, say, $'\014' or $'\f'.
>
> To offer some other perspective on the (correct) answers by Emanuel and
> Drew, remember that a regular expression is, basically, a string
> where each character is interpreted as "itself", unless it is a "regexp
> special" character [1]. So, for example searching for the regular expression
> "a" will find all "a"s in your text, because the character a isn't a
> "regexp special".
>
> Now ASCII control characters are all *not* "regexp special" so you only
> have to find a way to express them whithin a string. How, that is stated
> in the Emacs Lisp manual when it talks about "string type" [2] (especially
> the subnode "Non-ASCII Characters in Strings", which leads you to "character
> type" [3]. The special forms "\f", "\^L" or "\C-L" (all of them equivalent),
> which all were talked about here are treated in a subnode of the above [4].
> This notation carries some historical baggage, so don't expect too much
> logic from it.
>
> For example, why ^L? Because form feed is at point 12 (in decimal) in the
> ascii table, and L at point 76, the difference being 64.
$ man ascii |egrep ' L$'
014 12 0C FF '\f' (form feed) 114 76 4C L
> What happens is that the "^" "subtracts 64 from the character code", or more
> precisely
> masks out bit 6 of its binary representation.
$ man ascii |egrep ' \^$'
036 30 1E RS (record separator) 136 94 5E ^
If so, the RS should be represented by ^^ in a self-consistent way :-)
> So ^M would be "carriage return" and so on. Just have a look at the ASCII
> table.
$ man ascii |egrep ' M$'
015 13 0D CR '\r' (carriage ret) 115 77 4D M
> Then "\f" comes from the C string literal representation. It's meant to
> be mnemonic ("f" for "form feed" -- similarly "\n" for "line feed", aka
> "new line", "\b" for "bell" and so on).
>
> The references below lead you to more alternative representations, like
> short hex "\x0C", short Unicode hex "\u000C", long Unicode hex "\U0000000C";
> there are also (mostly historical) octals, etc.
>
> You can even put the unicode /names/ in there, using the "\N{...}"
> notation, so your ^L can be named "\N{FORM FEED (FF)}" (yes the (FF)
> in parentheses is part of it: the Unicode Consortium put it in there.
> Life is like that).
>
> If you want to explore those unicode names, type in C-x 8 <RET>, you
> can autocomplete your way among them.
>
> Hope this gives some rough map for that landscape :-)
Thank you for your systematic and informative comments and explanations.
> Cheers
>
> [1] Emacs Lisp reference manual "Syntax of Regular Expressions"
> or
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-of-Regexps.html
>
>
> [2] Emacs Lisp reference manual "String Type" and its subnodes
> or
> https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Type.html
>
> [3] Emacs Lisp reference manual "Character Type"
>
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Type.html
>
> [4] Emacs Lisp reference manual "Control-Character Syntax"
>
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Ctl_002dChar-Syntax.html
>
> - tomás
Best,
HY
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), (continued)
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Hongyi Zhao, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Hongyi Zhao, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED., Michael Heerdegen, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED., Hongyi Zhao, 2021/07/22
- RE: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Drew Adams, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Emanuel Berg, 2021/07/31
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.),
Hongyi Zhao <=
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Hongyi Zhao, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), tomas, 2021/07/22
- Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.), Emanuel Berg, 2021/07/31
Re: The `^L' appeared in built-in help., 2QdxY4RzWzUUiLuE, 2021/07/06