[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNUstep Web browser (was Re: WebKit Bounty)
From: |
address@hidden |
Subject: |
Re: GNUstep Web browser (was Re: WebKit Bounty) |
Date: |
5 Mar 2007 02:53:09 -0800 |
User-agent: |
G2/1.0 |
> or pass html through html tidy first.
It appears unnecessary to me to go that way because it first parses
HTML into a tree, then fixes some things and writes out HTML just to
parse it again...
I have read through the rules html tidy uses and in most cases the
following rules will have the same or a quite similar result (ok it
needs more testing with badly designed pages):
* if the closing tag does not match the opening tag, search outwards
until you find one (if you don't find, ignore)
* be lazy with missing quotes in tag attributes
* convert all tag names and attribute names to upper case
* ignore <html>, <head>, <body> (except for attributes)
* some tags always go to the HEAD section (e.g. <title>, <meta>)
wherever they appear
* ignore unknown tags
As soon as I have new more or less stable code, I will upload a
snapshot and you can look into it.
-- hns