[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #42233] [PATCH] wcwidth(3) used on UCS4/UTF-32 codepoints

From: Steffen Nurpmeso
Subject: Re: [bug #42233] [PATCH] wcwidth(3) used on UCS4/UTF-32 codepoints
Date: Wed, 04 Nov 2020 23:47:58 +0100
User-agent: s-nail v14.9.19-158-g7c269c7f-dirty

G. Branden Robinson wrote in
 |At 2020-10-21T15:18:29+0200, Steffen Nurpmeso wrote:
 |>> Steffen has withdrawn most/all of his other patches and even after
 |>> reading
 |> I do not know what this has to do with this bug.
 |As I recall (dimly), you said something at some point that I perhaps
 |misinterpreted--that you were going to work on a project called s-roff
 |and were not going to participate in any further "pull requests", if you
 |will, involving groff.
 |If I misunderstood you, I apologize.

Hm, no, it was just frustrating half a decade ago, and yes,
i still have s-roff in the queue and i really, really want to that
roff package (it will not happen before 2022, that is for sure,
i lost almost the entire year, and will not leave the mailer
i maintain before i reached a state where i can).
I tried to switch to it several times, but one thing or the other
interfered, and then i said i w... ah, whatever.
Anyhow, i synchronized it several times, but stopped with the
current groff release, because the effort was too large.

I keep a list of things i want to cherry pick (or re-create around
the topic due to license issues), that is all.  The rest of the
way i will have to go alone.

 |>> this report a few times I'm not clear on what exactly the problem
 |>> is supposed to be.
 |>> The "solution", "drop gnulib", is not likely, especially not during
 |>> an RC cycle.
 |> Sorry, what??
 |I refer to this statement in the original bug report:
 |"The neat side effect of that is that the entire GNULib can be
 |unhooked and removed from groff(1)."
 |However, you're right that this side effect was not proposed as
 |_necessary_, only possible.

Yes, i think by then groff had nothing to do with GNUlib, and
i posted a binary search table of a few kilobytes which could have
been used as a correct implementation (other possibilities would
exist, i think Xorg and mksh, for example, and Plan9Port and such,
they all implement such binary lookup arrays to satisfy the very
problem in question).
Now that ship has sailed, and GNUlib functions could be used,
i presume.  (I only looked into GNUlib once, at that time, and
there were functions which could have been used ... if i recall

 |>> This could be reopened if we had a simple, reproducible case of
 |>> groff actually misbehaving.
 |>>> I think currently groff makes false use of wcwidth(3): if it finds
 |>>> the `unicode' property in a `DESC' file it uses wcwidth(3) to
 |>>> determine the visual width, not taking into account the current
 |>>> locale, but which wcwidth(3) depends upon.
 |>> I don't understand[.]
 |> I am too old for this shit, really.  I therefore agree.
 |I am struggling with the non-idiomatic expression "makes false use".  I
 |can interpret it, but only vaguely.  Also, I may lack domain knowledge
 |It's my understanding that Unicode defines a property called "East Asian
 |width"--at least that's what my local unicode(1) command calls it.

No, no, Unicode defines a character width (0, 1 and 2, at the time
i last looked into it, which has been a couple of years indeed).
The problem is anyhow that with preconv Unicode is fed into the
machine, the DESC supports a Unicode property, but then the code
does not care at all and uses the wcwidth(3) function, which works
in correspondance with the currently active locale, which is

I have never ("not yet") looked deep enough to know whether it
matters -- but: having the correct function in place there
(preconv==Unicode, DESC==Unicode, .. character handling==Unicode?)
seems to be the right approach.

 |>> [.] why the width of a Unicode character would be locale-dependent.
 |>> As I understand it, the width property (half-width, full-width,

And zero-width.

 |>> undefined) is determined on a per-codepoint basis and while it might
 |>> vary, there's no reason to expect it to vary based on the _locale_.
 |>> More likely, I think, it would vary due to choices taken by a font
 |>> vendor, and people using the font would be forced to adapt.
 |Thinking about this some more, the possibility of an "ambiguous" or
 |"undefined" character width at the UCS level could mean that the locale
 |is permitted to determine this parameter.

Unicode does not do that.  Again, long time, but if i recall
correctly anything which is not defined otherwise is normal width.

 |>> Closing.
 |> I think there was a ML thread by the time i opened the bug report,
 |> where the according GNUlib function that could simply be used to
 |> correctly implement the given was named.
 |Hmm. It would be good to find this.  I wonder if Dave Kemper can help;
 |to my eyes, he seems to have a fluttering cape that advertises a deep
 |knowledge of our mailing list history.
 |> Then that piece of cake would be correct, despite possibly non-capable
 |> surroundings.
 |If this would fix the infinite loop Osamu Ayama found, and that I
 |crudely hacked around in bcdf2f4c7c28328c711c6a7ac2ea17f2ecd5cdd4 (also

I do not know.  Zero width can happen.

 |see ), that would be terrific.

 |I think we just need someone with a little more gnulib and/or
 |wide-character savvy than I possess right now to articulate the issue so
 |that I can understand it.  Eventually, I'll learn, but Bertrand's trying
 |to get an RC1 out.  :)

 --End of <20201101042024.gfw4dqth3qlfxxw2@localhost.localdomain>

|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]