[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #47689] Support for UTF-16 encoding.
From: |
Tim Ruehsen |
Subject: |
[Bug-wget] [bug #47689] Support for UTF-16 encoding. |
Date: |
Thu, 14 Apr 2016 07:43:41 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 |
Update of bug #47689 (project wget):
Status: None => Confirmed
_______________________________________________________
Follow-up Comment #2:
Downloading works, but this issue is about content parsing (recursive
downloads).
The server does not state a content-encoding, but the document (index.html)
contains a BOM (Byte Order Mark) that says it is UTF-16LE encoded.
What has to be done is to convert the "input-stream" into UTF-8 (that's what
wget is able to work with).
Currently we assume input data usable with traditional C string functions.
UTF-16 (Unicode) can't be used with traditional C string functions.
See https://html.spec.whatwg.org/multipage/syntax.html#the-input-byte-stream
@Eli After downloading, try
$ wget -d -r --local-encoding=UTF-16LE --input-file index.html --force-html
--base http://www.free-energy-info.co.uk
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?47689>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/