Re: [Bug-wget] Page encoding problem

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Page encoding problem

From:	Micah Cowan
Subject:	Re: [Bug-wget] Page encoding problem
Date:	Mon, 09 Jul 2012 22:41:29 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 07/09/2012 10:24 PM, Owen Watson wrote:
> Would --local-encoding=UTF-8 fix it?

Unlikely. IIRC, that changes how wget behaves in terms of deciding how
to translate non-ascii URLs (IRIs) on the command-line, and I think how
it saves non-ascii file names, but I don't believe it will modify file
contents.

Basically, your best bet is to run an equivalent to that Unix pipeline:
something that can run through all the archived files, and correct their
meta http-equiv stuff. If it were me, I'd probably install the "find"
and "sed" commands so I could run that exact pipeline (except the
single-quotes would have to be double-quotes in a dos pipeline, I
think). Maybe install msys or some other kit that provides such
commands. Cygwin's overkill, I'm sure...

Note that what it would NOT do, was correct files that don't SPECIFY
their character encoding... Firefox always assumes latin1 if nobody
tells it otherwise. Probably because that's what it's supposed to do,
according to the HTML specification, IIRC. However, Google Chrome at
least, appears to auto-detect the content's encoding. Perhaps FF has a
config that supports this too (or you could force FF to use UTF-8).

Good luck!
-mjc

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Page encoding problem, Owen Watson, 2012/07/09
- Re: [Bug-wget] Page encoding problem, Micah Cowan, 2012/07/10
  - Message not available
    - Message not available
    - Re: [Bug-wget] Page encoding problem, Micah Cowan <=

Prev by Date: Re: [Bug-wget] Page encoding problem
Next by Date: [Bug-wget] Suggestion: Option to override abort on 404 pages
Previous by thread: Re: [Bug-wget] Page encoding problem
Next by thread: [Bug-wget] Suggestion: Option to override abort on 404 pages
Index(es):
- Date
- Thread