[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Parsing HTML, best practice with Chicken

From: mfv
Subject: Re: [Chicken-users] Parsing HTML, best practice with Chicken
Date: Mon, 29 Dec 2014 19:47:33 +0100
User-agent: Mutt/1.5.21 (2010-09-15)


> I somehow always manage to get it working with sxpath when I need to do
> some web scraping, but it's somewhat painful.

Thanks, I will have a look at sxpath.

> >  Are there any packages like Python's Beautifulsoup in the Chicken
> > arsenal?
> That sort of thing is sorely lacking.  There's a promising "zipper"
> library written by Moritz Heidkamp, but so far it's unreleased and
> undocumented.  If you're feeling very adventurous you could have
> a look at it:

Pity. I will have a look at the BeautifulSoup source. Maybe I can copy/mimic 
sort of its functionality. 

And yes, I will have a look at 'zipper'.
> (define sxml (call-with-input-request lnk #f html->sxml))

You are right. It is step by step for me, and I am in the first steps.. (-;

> In fact, I didn't even know you could use html->sxml on a
> string.  This seems to be an undocumented feature of html-parser :)

I actually just tried it, as I had great difficulties in understanding the
actual documentation of html-parser. No idea what it does under the hood -
espcecially with all those :start:, :end:, :process: commands - and I did
not have the time to glimpse into the source. 

All in all, I must say that it is much more difficult to get going with
Chicken then with Python. The overall language is simple, but the learning
curve is fairly steep - and I am not sure whether it will pay off. 

It terms of tooling, Python/Threading/Beautifulsoup might be the winner
here. It is a simple 'hack-away' experience. But I guess that does not make
me learn new tricks...

My hope is that scheme is some sort of entry door into LFE/Clojure and makes
me think more about algorithms. 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]