bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Page encoding problem


From: Micah Cowan
Subject: Re: [Bug-wget] Page encoding problem
Date: Mon, 09 Jul 2012 22:41:29 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 07/09/2012 10:24 PM, Owen Watson wrote:
> Would --local-encoding=UTF-8 fix it?

Unlikely. IIRC, that changes how wget behaves in terms of deciding how
to translate non-ascii URLs (IRIs) on the command-line, and I think how
it saves non-ascii file names, but I don't believe it will modify file
contents.

Basically, your best bet is to run an equivalent to that Unix pipeline:
something that can run through all the archived files, and correct their
meta http-equiv stuff. If it were me, I'd probably install the "find"
and "sed" commands so I could run that exact pipeline (except the
single-quotes would have to be double-quotes in a dos pipeline, I
think). Maybe install msys or some other kit that provides such
commands. Cygwin's overkill, I'm sure...

Note that what it would NOT do, was correct files that don't SPECIFY
their character encoding... Firefox always assumes latin1 if nobody
tells it otherwise. Probably because that's what it's supposed to do,
according to the HTML specification, IIRC. However, Google Chrome at
least, appears to auto-detect the content's encoding. Perhaps FF has a
config that supports this too (or you could force FF to use UTF-8).

Good luck!
-mjc



reply via email to

[Prev in Thread] Current Thread [Next in Thread]