--- Begin Message ---
Subject: |
Detection of section name in man.el |
Date: |
Sun, 6 Aug 2017 01:44:19 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
When parsing manual in languages with non-ascii letters, the section
names using non-ascii letters are not added to the table of content.
I noticed the bug reading the French bash manual: the quite useful
"COMMANDES INTERNES DE l'INTERPRÉTEUR" section does not appear (SHELL
BUILTIN COMMAND). (because of the É letter)
I propose to use Character class instead of ascii interval in the
appropriate regexp defvar. It should not change anything for english
manual and it should work for many other languages.
It works great for the bash manual in French.
Grégory Mounié
0001-Unicode-support-for-man-section-name-detection.patch
Description: Text Data
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#27978: Detection of section name in man.el |
Date: |
Fri, 18 Aug 2017 11:49:57 +0300 |
> From: Grégory Mounié
> <address@hidden>
> Date: Sun, 6 Aug 2017 01:44:19 +0200
>
> When parsing manual in languages with non-ascii letters, the section
> names using non-ascii letters are not added to the table of content.
>
> I noticed the bug reading the French bash manual: the quite useful
> "COMMANDES INTERNES DE l'INTERPRÉTEUR" section does not appear (SHELL
> BUILTIN COMMAND). (because of the É letter)
>
> I propose to use Character class instead of ascii interval in the
> appropriate regexp defvar. It should not change anything for english
> manual and it should work for many other languages.
Thanks, I pushed these changes with some minor adjustments.
Specifically:
> -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]"
> +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]"
> "Regular expression describing a manpage section within parentheses.")
I didn't change this one, because I think a section always uses only
ASCII letters and numbers, as in ".1n". If you disagree, can you show
an example where this is not so?
> -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$"
> +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$"
> "Regular expression describing a manpage heading entry.")
I see no reason to replace 0-9 with [:digit:] here, since I think
non-ASCII digits will never be used in this context. Do you agree?
Incidentally, I see quite a few similar regexps elsewhere in man.el,
did you audit all of them and established that they don't need similar
changes? If not, would you like to propose similar changes there?
--- End Message ---