bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#27978: Detection of section name in man.el


From: Eli Zaretskii
Subject: bug#27978: Detection of section name in man.el
Date: Fri, 18 Aug 2017 22:23:10 +0300

[Please keep the bug address on the CC list.]

> From: Grégory Mounié <Gregory.Mounie@imag.fr>
> Date: Fri, 18 Aug 2017 19:53:44 +0200
> 
>   In brief, I would not change the other a-zA-Z regexps (details below).
> 
>   But I would change the SEE ALSO regexp (around line 298) to add other 
> languages. Should I fill another bug report with another patch  ?
> 
> (defvar Man-see-also-regexp "SEE ALSO"
>    "Regular expression for SEE ALSO heading (or your equivalent).
> This regexp should not start with a `^' character.")
> 
>   using the debian manpages translation as référence, and using
>   "zgrep -h SH man*/*  | sort | uniq -c | sort -n" inside appropriate 
> /usr/share/man subdirectories to infer the values, I propose:
> 
>   "SEE ALSO\|VOIR AUSSI\|SIEHE AUCH\|VÉASE TAMBIÉN\|VEJA TAMBÉM\|VEDERE 
> ANCHE\|ZOBACZ TAKŻE\|İLGİLİ BELGELER\|参照|参见 SEE ALSO\|參見 SEE ALSO"
> 
>   (French, German, Spanish, Portugese, Italian, Polish, Turkish, 
> Japanese, Chinese CN, Chinese TW)

OK.  If no one objects, I will make this change soon.  Thanks.

> Details below about the a-zA-Z regexps:
> 
> Le 18/08/2017 à 10:49, Eli Zaretskii a écrit :
> > 
> > Thanks, I pushed these changes with some minor adjustments.
> > Specifically:
> > 
> >> -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]"
> >> +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]"
> >>     "Regular expression describing a manpage section within parentheses.")
> > 
> > I didn't change this one, because I think a section always uses only
> > ASCII letters and numbers, as in ".1n".  If you disagree, can you show
> > an example where this is not so?
> > 
> 
>   I have install the various multilingual standard manpages of my debian 
> and I have not grep a counter example so I guess it is perfect.
> 
> >> -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$"
> >> +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$"
> >>     "Regular expression describing a manpage heading entry.")
> > 
> > I see no reason to replace 0-9 with [:digit:] here, since I think
> > non-ASCII digits will never be used in this context.  Do you agree?
> > 
> > Incidentally, I see quite a few similar regexps elsewhere in man.el,
> > did you audit all of them and established that they don't need similar
> > changes?  If not, would you like to propose similar changes there?
> > 
> 
>   There are 18 a-Z. They seem like a detection carefully crafted by 
> history, thus I would not change them without counter-example either.
> 
>   The first four a-zA-Z seems related to the parsing of external 
> command, with particularities in Windows port so I would not recommend 
> to change it.
>   The 5-18 a-zA-Z try to guess the manpage around POS. The main pattern
>   is "-a-zA-Z0-9._+:"
> 
>   With the same set of multi-lingual manpages, I have found only one 
> character used in manpage name and not in the set: "[" (man [ leads you 
> to test). I suspect that adding "[" would add more regressions than 
> solutions.
> 
>   Note that line 720 the pattern is slightly different (missing "-._:"). 
> I do not understand really why.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]