[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#27978: Detection of section name in man.el
From: |
Eli Zaretskii |
Subject: |
bug#27978: Detection of section name in man.el |
Date: |
Fri, 18 Aug 2017 22:23:10 +0300 |
[Please keep the bug address on the CC list.]
> From: Grégory Mounié <Gregory.Mounie@imag.fr>
> Date: Fri, 18 Aug 2017 19:53:44 +0200
>
> In brief, I would not change the other a-zA-Z regexps (details below).
>
> But I would change the SEE ALSO regexp (around line 298) to add other
> languages. Should I fill another bug report with another patch ?
>
> (defvar Man-see-also-regexp "SEE ALSO"
> "Regular expression for SEE ALSO heading (or your equivalent).
> This regexp should not start with a `^' character.")
>
> using the debian manpages translation as référence, and using
> "zgrep -h SH man*/* | sort | uniq -c | sort -n" inside appropriate
> /usr/share/man subdirectories to infer the values, I propose:
>
> "SEE ALSO\|VOIR AUSSI\|SIEHE AUCH\|VÉASE TAMBIÉN\|VEJA TAMBÉM\|VEDERE
> ANCHE\|ZOBACZ TAKŻE\|İLGİLİ BELGELER\|参照|参见 SEE ALSO\|參見 SEE ALSO"
>
> (French, German, Spanish, Portugese, Italian, Polish, Turkish,
> Japanese, Chinese CN, Chinese TW)
OK. If no one objects, I will make this change soon. Thanks.
> Details below about the a-zA-Z regexps:
>
> Le 18/08/2017 à 10:49, Eli Zaretskii a écrit :
> >
> > Thanks, I pushed these changes with some minor adjustments.
> > Specifically:
> >
> >> -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]"
> >> +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]"
> >> "Regular expression describing a manpage section within parentheses.")
> >
> > I didn't change this one, because I think a section always uses only
> > ASCII letters and numbers, as in ".1n". If you disagree, can you show
> > an example where this is not so?
> >
>
> I have install the various multilingual standard manpages of my debian
> and I have not grep a counter example so I guess it is perfect.
>
> >> -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$"
> >> +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$"
> >> "Regular expression describing a manpage heading entry.")
> >
> > I see no reason to replace 0-9 with [:digit:] here, since I think
> > non-ASCII digits will never be used in this context. Do you agree?
> >
> > Incidentally, I see quite a few similar regexps elsewhere in man.el,
> > did you audit all of them and established that they don't need similar
> > changes? If not, would you like to propose similar changes there?
> >
>
> There are 18 a-Z. They seem like a detection carefully crafted by
> history, thus I would not change them without counter-example either.
>
> The first four a-zA-Z seems related to the parsing of external
> command, with particularities in Windows port so I would not recommend
> to change it.
> The 5-18 a-zA-Z try to guess the manpage around POS. The main pattern
> is "-a-zA-Z0-9._+:"
>
> With the same set of multi-lingual manpages, I have found only one
> character used in manpage name and not in the set: "[" (man [ leads you
> to test). I suspect that adding "[" would add more regressions than
> solutions.
>
> Note that line 720 the pattern is slightly different (missing "-._:").
> I do not understand really why.