Re: [Chicken-users] Parsing HTML, best practice with Chicken

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Parsing HTML, best practice with Chicken

From:	mfv
Subject:	Re: [Chicken-users] Parsing HTML, best practice with Chicken
Date:	Mon, 29 Dec 2014 19:47:33 +0100
User-agent:	Mutt/1.5.21 (2010-09-15)

Hello, 

> I somehow always manage to get it working with sxpath when I need to do
> some web scraping, but it's somewhat painful.

Thanks, I will have a look at sxpath.


> >  Are there any packages like Python's Beautifulsoup in the Chicken
> > arsenal?
> 
> That sort of thing is sorely lacking.  There's a promising "zipper"
> library written by Moritz Heidkamp, but so far it's unreleased and
> undocumented.  If you're feeling very adventurous you could have
> a look at it: https://bitbucket.org/DerGuteMoritz/zipper

Pity. I will have a look at the BeautifulSoup source. Maybe I can copy/mimic 
some
sort of its functionality. 

And yes, I will have a look at 'zipper'.
 
> (define sxml (call-with-input-request lnk #f html->sxml))

You are right. It is step by step for me, and I am in the first steps.. (-;

> In fact, I didn't even know you could use html->sxml on a
> string.  This seems to be an undocumented feature of html-parser :)

I actually just tried it, as I had great difficulties in understanding the
actual documentation of html-parser. No idea what it does under the hood -
espcecially with all those :start:, :end:, :process: commands - and I did
not have the time to glimpse into the source. 

All in all, I must say that it is much more difficult to get going with
Chicken then with Python. The overall language is simple, but the learning
curve is fairly steep - and I am not sure whether it will pay off. 

It terms of tooling, Python/Threading/Beautifulsoup might be the winner
here. It is a simple 'hack-away' experience. But I guess that does not make
me learn new tricks...

My hope is that scheme is some sort of entry door into LFE/Clojure and makes
me think more about algorithms. 

Regards, 

  Piotr

[Prev in Thread]

Current Thread

[Next in Thread]

[Chicken-users] Parsing HTML, best practice with Chicken, mfv, 2014/12/28
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Kooda, 2014/12/29
  - Re: [Chicken-users] Parsing HTML, best practice with Chicken, Mario Domenech Goulart, 2014/12/29
  - Re: [Chicken-users] Parsing HTML, best practice with Chicken, mfv, 2014/12/29
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Peter Bex, 2014/12/29
  - Re: [Chicken-users] Parsing HTML, best practice with Chicken, mfv <=
    - Re: [Chicken-users] Parsing HTML, best practice with Chicken, Peter Bex, 2014/12/29
    - Re: [Chicken-users] Parsing HTML, best practice with Chicken, Alex Shinn, 2014/12/29
- Re: [Chicken-users] Parsing HTML, best practice with Chicken, Ivan Raikov, 2014/12/29

Prev by Date: Re: [Chicken-users] Parsing HTML, best practice with Chicken
Next by Date: Re: [Chicken-users] Parsing HTML, best practice with Chicken
Previous by thread: Re: [Chicken-users] Parsing HTML, best practice with Chicken
Next by thread: Re: [Chicken-users] Parsing HTML, best practice with Chicken
Index(es):
- Date
- Thread