[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Permissive html parser for guile

From: Panicz Maciej Godek
Subject: Re: Permissive html parser for guile
Date: Wed, 23 Jan 2019 22:04:23 +0100

I believe that the canonical way of working with XML documents in Guile is
through the (sxml simple) module (and others):

It contains xml->sxml function which allows to convert XML strings to a
more familiar s-expression based format.

śr., 23 sty 2019 o 17:41 swedebugia <address@hidden> napisał(a):

> I just found this LGPL3 parser by Neil Van Dyke (see attachment)
> Do we have something similar in guile?
> If not is anybody interested in porting it? (I have no idea how much
> work it would be, but Racket seems quite close to guile)
> Here is the introduction:
> "The html-parsing library provides a permissive HTML parser. The parser
> is useful for software agent extraction of information from Web pages,
> for programmatically transforming HTML files, and for implementing
> interactive Web browsers. html-parsing emits SXML/xexp, so that
> conventional HTML may be processed with XML tools such as SXPath. Like
> Oleg Kiselyov’s SSAX-based HTML parser, html-parsing provides a
> permissive tokenizer, but html-parsing extends this by attempting to
> recover syntactic structure.
> The html-parsing parsing behavior is permissive in that it accepts
> erroneous HTML, handling several classes of HTML syntax errors
> gracefully, without yielding a parse error. This is crucial for parsing
> arbitrary real-world Web pages, since many pages actually contain syntax
> errors that would defeat a strict or validating parser. html-parsing’s
> handling of errors is intended to generally emulate popular Web
> browsers’ interpretation of the structure of erroneous HTML."
> --
> Cheers Swedebugia

reply via email to

[Prev in Thread] Current Thread [Next in Thread]