discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNUstep Web browser (was Re: WebKit Bounty)


From: Thom Cherryhomes
Subject: Re: GNUstep Web browser (was Re: WebKit Bounty)
Date: Mon, 5 Mar 2007 07:32:14 -0500

I do want to ask, have any of you actually done any of this sort of
thing before?

It always seems, every time I run a page that's been badly formatted
through Tidy, it's a crap shoot as to whether it will wind up with the
same visual representation that was intended in the first place.

It's just that you guys are trying to create an elegant solution for
an ideal world without fully understanding the interaction of all the
pieces you guys are trying to put together. IF YOU ACTUALLY STUDY
Gecko or KHTML, you'll find that great lengths have been done to make
sure that special cases are taken care of, without disturbing the way
the stream is entered into the parser.

This isn't as simple as you guys are making it out to be, and it's all
because it was never done RIGHT to begin with.

It's not that this is impossible, but you are creating bigger
headaches farther down the road, for what? to make sure this fits into
your world view of how things should be? come on, guys. THINK.

-Thom



On 5 Mar 2007 02:53:09 -0800, hns@computer.org <hns@computer.org> wrote:
> or pass html through html tidy first.

It appears unnecessary to me to go that way because it first parses
HTML into a tree, then fixes some things and writes out HTML just to
parse it again...

I have read through the rules html tidy uses and in most cases the
following rules will have the same or a quite similar result (ok it
needs more testing with badly designed pages):
* if the closing tag does not match the opening tag, search outwards
until you find one (if you don't find, ignore)
* be lazy with missing quotes in tag attributes
* convert all tag names and attribute names to upper case
* ignore <html>, <head>, <body> (except for attributes)
* some tags always go to the HEAD section (e.g. <title>, <meta>)
wherever they appear
* ignore unknown tags

As soon as I have new more or less stable code, I will upload a
snapshot and you can look into it.

-- hns

_______________________________________________
Discuss-gnustep mailing list
Discuss-gnustep@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnustep





reply via email to

[Prev in Thread] Current Thread [Next in Thread]