Make a quick html data extractor with REBOL (video)
July 19th, 2008
Image via WikipediaI will soon need to make some sort of web-spider. It seems to me that a very good choice for making one is REBOL. I was programming in REBOL years ago so I forgot most of things. REBOL recently released a very much awaited version 3 (I was a little afraid it is more of a “vaporware” ), and after few years showed that Carl Sassenrath still “means business” with it. So my interest in this interesting language is renoved too. Another big plus is that 3.0 greatly solves some issues that I had with the original REBOL (general closeness, artif. limits in Core version, async stuff). I decided to make a quick video of the procedure so that I get you interested in this excellent language too. Yes you, who did you think I was talking about
I do the following in the video: I load a webpage of a “last 100 motorbike ads” thru HTTP, I parse the unnecesarry html away first then I parse the table that holds the ads into separate rows first and then each row into concrete data.
Interesting tidbits: do you see how I don’t enquote http address (REBOL has many datatypes and URL is one of them) or XML tags (the same reason). Most of the time I just use the parse word which moves me into parse DIALECT. Most languages have functions, classes but rebol also has dialects. parse dialect is just one of them and you can create your own as you can create your own functions for example.
I used only the most basic features of parse dialect here. It’s an unbelievably powerfull feture.


July 23rd, 2008 at 8:40 am
Interesting… I’ll soon also work on a web spider, but will probably go with PHP, since I need it in a very specific way.
Thanks for showing us a REBOL (thumbs up).
July 24th, 2008 at 11:32 am
Interesting… what will you be doing with it… if it’s public BAC ?
(btw.. you should have a blog too)
July 25th, 2008 at 12:21 am
It will collect information from specific web sites. Hm… Is it still a web crawler then? It will be more like a parasite I guess…
Although it’s planned to become a real spider later on…
I have (had) a blog. But I couldn’t find the time and motivation to write. Anyway I have a little more time now, so I’m reopening soon and you are invited to the opening party.
July 26th, 2008 at 8:54 am
Mine will also collect info from specific sites, but I wouldn’t call it a parasite … or least I hope those sites wont
great, I will bring the beer… man you have to do it quick.. it’s the web2.o, you should blog and tweet and plurk and feed your friends ..friendfeed… (:B