Re: [Bug-wget] Problem with ÅÄÖ and wget

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem with ÅÄÖ and wget

From:	Ángel González
Subject:	Re: [Bug-wget] Problem with ÅÄÖ and wget
Date:	Thu, 03 Oct 2013 02:04:05 +0200
User-agent:	Thunderbird

On 24/09/13 10:38, Tim Ruehsen wrote:

Just for completeness: these guessing steps called "encoding sniffing
algorithm" are described in 12.2.2.2.
But only "In some cases, it might be impractical to unambiguously determine
the encoding before parsing the document.".

Yes, it allows to start parsing with one encoding, then abort and changeto a

different one.

I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia
'windows-1252' page, but couldn't find it on the HTML Living Standard pages.
Could you give me a pointer, please ?

It's at the beginning of html parsing, it lists several encodings givenby the pageand the encoding you should use to parse them, saying it is a willfulviolation.

What do you think, how can we address the iso / windows encoding issue (should
we ?) ? As I understood, it is only valid for HTML5...

It's just a matter of comparing the input encoding with a well-knownlist and replace it.

Is there a practical need for the sniffing algorithm ?

If we want to deal with the "ÅÄÖ links" properly, we should do encodingdetection.

Do you know any real web sites / pages where the encoding is ambiguous ?

I consider those web sites broken. But I don't have numbers.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] Problem with ÅÄÖ and wget, Ángel González <=

Prev by Date: Re: [Bug-wget] [PATCH] support for gzipped transfer in wget-1.14
Next by Date: [Bug-wget] files changed to directories!
Previous by thread: Re: [Bug-wget] [PATCH] support for gzipped transfer in wget-1.14
Next by thread: [Bug-wget] files changed to directories!
Index(es):
- Date
- Thread