lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Zero-width space


From: Thomas Dickey
Subject: Re: [Lynx-dev] Zero-width space
Date: Sun, 25 Aug 2024 20:03:32 -0400

On Sun, Aug 25, 2024 at 10:10:46AM -0600, rbell--- via Lynx-dev wrote:
> 
>       Quoth Mr Dickey: 'I don't see a literal ​ in lynx with
> ISO-8859-1 or 7bit approximations.  Your locale settings (and the
> effective options) might be relevant.'
>       Thanks for that answer.  I use ISO-8859-15 but the other 2
> make no difference.
> 
>       'I see that "&#" is generated in SGML.c, but that would be
> used in the source view, rather than the browsing view.'
>       I have my own test page of hundreds of metacharacters to see
> what works.  I mentioned that article because I found the coincidence
> amusing: who uses zwsp other than in a language that doesn't put
> spaces between words?  It's a quirk.  A commentator suggested that it
> may be a watermark.  I sometimes insert them to interfere with
> searches: a search looks at the literal, not the displayed, content,
> so bana​na isn't found by a search for banana.  Ligatures also
> thwart searches.
> 
>       '+ running lynx with locale C
>         '+ turning locale-based charset off
>         '+ setting document charset to UTF-8
>         '+ setting display charset to ISO-8859-1'
> 
>       None of these made a difference, except that UTF-8 makes all
> the metacharacters unreadable on my not-UTF system.  I append
> vt.default_utf8=0 to the kernel at boot.
>       8204-8207 are undisplayed.  I see where they're special-cased
> in LYCharUtils.c, thought that must be the place, added 8203, to no

The comment says this:

 *  This function used for translations HTML special fields inside tags
 *  (ALT=, VALUE=, etc.) from charset `cs_from' to charset `cs_to'.
 *  It also unescapes non-ASCII characters from URL (#fragments !)
 *  if st_URL is active.

Whether or not something is shown on the screen is a combination of
the UCdomap stuff and the curses library.

> avail.  Changing:
> 
>       else if (ucs == 0xfeff || (ucs >= 0x200b && ucs <= 0x200f))
> to
>       else if (ucs == 0xfeff || (ucs > 0x200b && ucs <= 0x200f)) 
> 
> in UCdomap.c does, which seems odd: it makes 8203 no longer marked as
> a zero-width character.

...and after that it looks in a table to see what it might be.
Not having the patches you're using, I can only guess.
 
>       The problem happens with the package lynx 2.9.2 from lynx, so
> it isn't one of my patches gone rogue.
> 
> russell bell
> 
> 

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]