bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #47689] Support for UTF-16 encoding.


From: Tim Ruehsen
Subject: [Bug-wget] [bug #47689] Support for UTF-16 encoding.
Date: Thu, 14 Apr 2016 07:43:41 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

Update of bug #47689 (project wget):

                  Status:                    None => Confirmed              

    _______________________________________________________

Follow-up Comment #2:

Downloading works, but this issue is about content parsing (recursive
downloads).

The server does not state a content-encoding, but the document (index.html)
contains a BOM (Byte Order Mark) that says it is UTF-16LE encoded.

What has to be done is to convert the "input-stream" into UTF-8 (that's what
wget is able to work with).

Currently we assume input data usable with traditional C string functions.
UTF-16 (Unicode) can't be used with traditional C string functions.

See https://html.spec.whatwg.org/multipage/syntax.html#the-input-byte-stream

@Eli After downloading, try
$ wget -d -r --local-encoding=UTF-16LE --input-file index.html --force-html
--base http://www.free-energy-info.co.uk


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?47689>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]