bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23647: 25.1.50; In man pages, links on hyphenated words don't work


From: Stephen Berman
Subject: bug#23647: 25.1.50; In man pages, links on hyphenated words don't work
Date: Mon, 30 May 2016 15:55:47 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

On Mon, 30 May 2016 03:22:58 +0300 Eli Zaretskii <address@hidden> wrote:

>> From: Stephen Berman <address@hidden>
>> Cc: address@hidden
>> Date: Mon, 30 May 2016 01:09:21 +0200
>> 
>> > Is it only the ASCII hyphen/minus, or could there be other characters
>> > (e.g., if Groff/troff are invoked with some exotic -Tfoo switch)?
>> 
>> That possibility didn't occur to me but according to Wikipedia, groff
>> also outputs soft hyphens (octal 255) and indeed I see that the function
>> Man-build-references-alist, which also removes hyphenation (in a more
>> complicated way that doesn't seem to be needed in the present case),
>> also takes the soft hyphen into account.  That can be done here too by
>> changing the above string-match regexp to "[-­]".  If someone knows of
>> other possibilities allowed by [gt]roff, maybe the regexp could be
>> further extended, or the condition reformulated as required.  What do
>> you think?
>
> I'm not enough of a roff expert to tell, but how about asking on the
> Groff list?

I did that and got this feedback from Steffen Nurpmeso:

> I have been convinced that soft hyphen is a control character and
> not something visual, it should be used as a «break-indicator»
> rather than as a hyphenation character, interpretation of which is
> left as an excercise for the processing software.  I have no idea
> still but would guess groff uses "hyphen minus" U+002D or hyphen
> U+2010 if Unicode is possible.

In a followup to another response he added:

> For display purposes however i think U+00AD can't be used
> directly, but will be replaced by the renderer to either nothing,
> if no wrap is to be applied at the character position, or
> something appropriate, like ASCII hyphen-minus or some extended
> Unicode "Pd" letter, of which there are some (e.g., U+058A
> ARMENIAN HYPHEN, U+1400 CANADIAN SYLLABICS HYPHEN, and more).

And he also made this suggestion:

> Eli Zaretskii is so active on the
> Unicode list, why don't you use the Pd character class for
> detecting «hyphen»?  I guess this should cover all such things
> already as of today, thanks to Werner Lemberg?!

So how should we proceed from here?  We could add U+2010 to the regexp
in my patch, which would then be this: "[-‐­]" (hyphen-minus (ASCII 45),
hyphen (U+2010), soft hyphen (U+00AD) -- it seems harmless to retain the
latter, given that man.el already uses it elsewhere), but if these are
all included in the Unicode Pd character class along with other possible
hyphen characters, maybe a different approach is required.  I know
nothing about the Pd character class and how to detect it with Elisp; I
also don't know if doing that would lead to further changes in man.el,
making this a larger undertaking.  What do you suggest?

Steve Berman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]