[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] Zero-width space
From: |
Thomas Dickey |
Subject: |
Re: [Lynx-dev] Zero-width space |
Date: |
Sun, 25 Aug 2024 20:03:32 -0400 |
On Sun, Aug 25, 2024 at 10:10:46AM -0600, rbell--- via Lynx-dev wrote:
>
> Quoth Mr Dickey: 'I don't see a literal ​ in lynx with
> ISO-8859-1 or 7bit approximations. Your locale settings (and the
> effective options) might be relevant.'
> Thanks for that answer. I use ISO-8859-15 but the other 2
> make no difference.
>
> 'I see that "&#" is generated in SGML.c, but that would be
> used in the source view, rather than the browsing view.'
> I have my own test page of hundreds of metacharacters to see
> what works. I mentioned that article because I found the coincidence
> amusing: who uses zwsp other than in a language that doesn't put
> spaces between words? It's a quirk. A commentator suggested that it
> may be a watermark. I sometimes insert them to interfere with
> searches: a search looks at the literal, not the displayed, content,
> so bana​na isn't found by a search for banana. Ligatures also
> thwart searches.
>
> '+ running lynx with locale C
> '+ turning locale-based charset off
> '+ setting document charset to UTF-8
> '+ setting display charset to ISO-8859-1'
>
> None of these made a difference, except that UTF-8 makes all
> the metacharacters unreadable on my not-UTF system. I append
> vt.default_utf8=0 to the kernel at boot.
> 8204-8207 are undisplayed. I see where they're special-cased
> in LYCharUtils.c, thought that must be the place, added 8203, to no
The comment says this:
* This function used for translations HTML special fields inside tags
* (ALT=, VALUE=, etc.) from charset `cs_from' to charset `cs_to'.
* It also unescapes non-ASCII characters from URL (#fragments !)
* if st_URL is active.
Whether or not something is shown on the screen is a combination of
the UCdomap stuff and the curses library.
> avail. Changing:
>
> else if (ucs == 0xfeff || (ucs >= 0x200b && ucs <= 0x200f))
> to
> else if (ucs == 0xfeff || (ucs > 0x200b && ucs <= 0x200f))
>
> in UCdomap.c does, which seems odd: it makes 8203 no longer marked as
> a zero-width character.
...and after that it looks in a table to see what it might be.
Not having the patches you're using, I can only guess.
> The problem happens with the package lynx 2.9.2 from lynx, so
> it isn't one of my patches gone rogue.
>
> russell bell
>
>
--
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
signature.asc
Description: PGP signature