[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] Parsing HTML, best practice with Chicken
From: |
mfv |
Subject: |
Re: [Chicken-users] Parsing HTML, best practice with Chicken |
Date: |
Mon, 29 Dec 2014 19:47:33 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hello,
> I somehow always manage to get it working with sxpath when I need to do
> some web scraping, but it's somewhat painful.
Thanks, I will have a look at sxpath.
> > Are there any packages like Python's Beautifulsoup in the Chicken
> > arsenal?
>
> That sort of thing is sorely lacking. There's a promising "zipper"
> library written by Moritz Heidkamp, but so far it's unreleased and
> undocumented. If you're feeling very adventurous you could have
> a look at it: https://bitbucket.org/DerGuteMoritz/zipper
Pity. I will have a look at the BeautifulSoup source. Maybe I can copy/mimic
some
sort of its functionality.
And yes, I will have a look at 'zipper'.
> (define sxml (call-with-input-request lnk #f html->sxml))
You are right. It is step by step for me, and I am in the first steps.. (-;
> In fact, I didn't even know you could use html->sxml on a
> string. This seems to be an undocumented feature of html-parser :)
I actually just tried it, as I had great difficulties in understanding the
actual documentation of html-parser. No idea what it does under the hood -
espcecially with all those :start:, :end:, :process: commands - and I did
not have the time to glimpse into the source.
All in all, I must say that it is much more difficult to get going with
Chicken then with Python. The overall language is simple, but the learning
curve is fairly steep - and I am not sure whether it will pay off.
It terms of tooling, Python/Threading/Beautifulsoup might be the winner
here. It is a simple 'hack-away' experience. But I guess that does not make
me learn new tricks...
My hope is that scheme is some sort of entry door into LFE/Clojure and makes
me think more about algorithms.
Regards,
Piotr
- [Chicken-users] Parsing HTML, best practice with Chicken, mfv, 2014/12/28
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Kooda, 2014/12/29
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Peter Bex, 2014/12/29
- Re: [Chicken-users] Parsing HTML, best practice with Chicken,
mfv <=
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Ivan Raikov, 2014/12/29