[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lynx-dev] bugreport: dumping utf8 html to utf8 text malforms \c5\a0 cha

From: Pavel Smerk
Subject: [Lynx-dev] bugreport: dumping utf8 html to utf8 text malforms \c5\a0 character before a new line
Date: Tue, 6 Oct 2009 23:52:57 +0200
User-agent: Mutt/

        Hello all,

having the following HTML code

<meta http-equiv=Content-Type content="text/html; charset=utf-8">

in the file in.html and running the following command

lynx -dump -display_charset=utf-8 -assume_charset=utf-8 -nomargins in.html > 

one gets back the following five bytes in the file out.txt

C5 A0 C5 0A 0A

where the second C5 is only a beginning of the correct two-byte utf-8
character C5 A0. May be the A0 byte is deleted because of some end-of-line
spaces trimming, which, however, would be rather surprising as the A0 itself
is not a correct utf-8 character, but in this case both the input and the
output are utf-8. And, of course, neither C5 itself is a correct utf-8
character, which means that the output is not even a correct utf-8 file.

Nevertheless, thank you for the great piece of software. :-)

With regards,

Pavel Smerk

reply via email to

[Prev in Thread] Current Thread [Next in Thread]