Some conditional REBOL parse examples
August 24th, 2008I am still improving my webcrawler, so I thought I would post some code to show you some more of REBOL’s parse functionality. I needed to parse somthing with this pattern:
City » Sevnica
<div>Name » Janko<br/> City » Sevnica<br/></div>
I assigned it to s-ok and parsed it with this code at first:
NAME: CITY: ""
parse s-ok [
to "Name" thru "»" copy NAME to "<br/>"
to "City" thru "»" copy CITY to "<br/>"
]
print NAME print CITY
;and got printed
Janko
Sevnica
So it works, but sometimes the page I parse includes only City and with code above we fail to get the City out:
<div>City » Sevnica<br/></div>
OPT (optional) in code below will take care of this. But the HTML is also slopy written so I have seen these variations of it:
s-ok: "<div>Name » Janko<br/> City » Sevnica<br/></div>"
s-1: "<div>Name » Janko<br/></div>"
s-2: "<div>City » Sevnica<br/></div>"
s-br: "<div>Name » Janko<br/> City » Sevnica</div>"
s-2brr: "<div>City »» Sevnica</div>"
We modify our code to parse this all and we put it into a function :
do-parse-all: func [ s ] [
NAME: CITY: ""
parse s [
OPT [ to "Name" thru "»" copy NAME to "<br/>" ]
OPT [ to "City" SOME [ thru "»" ] copy CITY [ to "<br/>" | to "</div>" ] ]
]
print NAME print CITY
]
Now we go to console and we extract info out of all variants:
>> do-parse-all s-ok
Janko
Sevnica
>> do-parse-all s-1
Janko
>> do-parse-all s-2
Sevnica
>> do-parse-all s-br
Janko
Sevnica
>> do-parse-all s-2brr
Sevnica
>>
BTW: The original parse would correctly parse s-ok and s-1.

August 26th, 2008 at 6:53 pm
“OPT”
Never came across OPT-ional before, guess I need to get re-reading some of the docs again. I have been using a [ "abc" | "def" | none ] format, which looks like it does the same, but OPT looks much cleaner. I`d have a guess there are a few more gems hidden in R3, if/when it is released.
August 26th, 2008 at 9:57 pm
I found out about “OPT” while looking at this: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Multiple_Values_in_a_Block
I am using R2 still for this so I have to admit I have not idea what’s parse like in R3.