[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handling invalid HTML

From: Juri Linkov
Subject: Handling invalid HTML
Date: Tue, 18 Oct 2005 11:06:42 +0300
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux)

Current rules of recognizing HTML files in Emacs are too strict:

1. The valid string delimiter for HTML attribute values is the
quotation character.  However, some HTML files on the Web use
apostrophes, e.g.

<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>

The program that generates such non-standard meta headers is identified
as 'Microsoft DHTML Editing Control' (no surprise).

`sgml-html-meta-auto-coding-function' can't determine encoding from
such invalid meta headers.  I propose to replace \" with [\"']
in regexps in `sgml-html-meta-auto-coding-function' to accept
such invalid HTML.  (The regexps in other function
`sgml-xml-auto-coding-function' already match [\"'] for XML files).

2. `sgml-html-meta-auto-coding-function' can't determine encoding when
HTML file has no `<html>' starting element.  An example of such HTML
file is the Mozilla Firefox bookmark file.  Sometimes it's needed
to open this file in Emacs and to use isearch on it, but Emacs can't
detect its encoding.  Perhaps the test `(search-forward "<html" size t)'
should be removed from `sgml-html-meta-auto-coding-function'.

3. Visiting Mozilla Firefox bookmark file in Emacs also can't detect
the type of this file.  Emacs opens it in SGML mode whereas it is
actually HTML file.  This problem is caused by the default value of
`magic-mode-alist'.  Maybe the `.html' extension in `auto-mode-alist'
should take precedence over `magic-mode-alist'?

Juri Linkov

reply via email to

[Prev in Thread] Current Thread [Next in Thread]