Some conditional REBOL parse examples
Sunday, August 24th, 2008I am still improving my webcrawler, so I thought I would post some code to show you some more of REBOL’s parse functionality. I needed to parse somthing with this pattern:
City » Sevnica
<div>Name » Janko<br/> City » Sevnica<br/></div>
I assigned it to s-ok and parsed it with this code at first:
NAME: CITY: ""
parse s-ok [
to "Name" thru "»" copy NAME to "<br/>"
to "City" thru "»" copy CITY to "<br/>"
]
print NAME print CITY
;and got printed
Janko
Sevnica
So it works, but sometimes the page I parse includes only City and with code above we fail to get the City out:
<div>City » Sevnica<br/></div>
OPT (optional) in code below will take care of this. But the HTML is also slopy written so I have seen these variations of it:
s-ok: "<div>Name » Janko<br/> City » Sevnica<br/></div>"
s-1: "<div>Name » Janko<br/></div>"
s-2: "<div>City » Sevnica<br/></div>"
s-br: "<div>Name » Janko<br/> City » Sevnica</div>"
s-2brr: "<div>City »» Sevnica</div>"
We modify our code to parse this all and we put it into a function :
do-parse-all: func [ s ] [
NAME: CITY: ""
parse s [
OPT [ to "Name" thru "»" copy NAME to "<br/>" ]
OPT [ to "City" SOME [ thru "»" ] copy CITY [ to "<br/>" | to "</div>" ] ]
]
print NAME print CITY
]
Now we go to console and we extract info out of all variants:
>> do-parse-all s-ok
Janko
Sevnica
>> do-parse-all s-1
Janko
>> do-parse-all s-2
Sevnica
>> do-parse-all s-br
Janko
Sevnica
>> do-parse-all s-2brr
Sevnica
>>
BTW: The original parse would correctly parse s-ok and s-1.
