REBOL’s actor-net used for real distributed system

July 3rd, 2009

I was playing with how actors + message passing system could look in REBOL. I wrote 3 posts about that, the latest is this one, and you can follow down from it.

The whole library that I called actor-net is still very small and simple (well that is positive) and just an experiment in progress. But I created a real thing with it last week. Site Assistant makes website screenshots now. Server and all tech runs on Linux but you can’t make screen shot of a Internet Explorer on it. I don’t intend to have a Windows server just for that reason so I made a distributed network of workers. Workers will run on my and computers of my colleagues and do their job when the’re available.

Because worker computers are just occasionally on-line and there can be times when no workers are on-line whole system from interface down is made so that this is no problem. Also if a worker falls out in the middle of the job no job remains undone.

So let’s get to the code. System is created with one work dispatcher called a Central and any number of Workers.

scheme

Central

my-bookman

Central consists of 3 actors. It runs all the time on our server, like a daemon/server. Let’s show the my-bookman first:

my-bookman: make actor [
	act-match: [
		[ 'worker-ready group addr ] [ debug [ group " worker is ready at " ] debug addr
			group-actor group addr
		]
	]
]

my-bookman reacts on one message worker-ready. It’s sent by a worker that is prepared to do some work. debug command probe-s the value when debug mode is set. group-actor group addr just adds our actor to the actor group.

work-checker

..it a timer-actor. It wakes up every now and so and checks if there is any work to do (urgent or regular). If there is work it sends a order-capture message to capture-director actor which lives inside this same node.

work-checker: make timer-actor [
	act-timer: [ 												debug "checking for work"
		*urgent-mode*: either ((random 10) > 2) [ true ] [ false ]
		either not none? (site: get-site-to-do *bot-ident*) [					debug "work found, making order"
			~send my-actor-addr director-id compose ['order-capture (site/domain) (site/user) 5 ]
		] [	debug "Nothing to do." 	]
	]
	delay: 00:00:20
]

capture-director

capture-director: make actor [
	act-match: [
		[ 'order-capture url user tries ] [ 						debug "got order for capture"
			either not zero? group-count "capturer" [ 			debug "sending order to worker"
				~send addr: group-select "capturer" compose [ 'make-capture (url) (user) ]
				group-remove "capturer" addr					debug "removing worker:" probe addr
			] [											debug "no workers.. resending to itself"
				if not zero? tries [
					~send my-actor-addr self/id compose [ 'order-capture (url) (user) (tries - 1) ]
				]
			]
		]
		[ 'capture-ready user url filename ][ 					debug "capture is ready"
			add-to-users-scans  compose [ file (filename) when (now/precise) ] user url *bot-ident*
			set-site-processed *bot-ident* url user
		]
	]
]

caputure-director reacts on two messages. When it receives order-capture which is sent by work-checker we met one paragraph up. If there are any workers waiting for work (added by the my-bookman) it sends random one of them a ‘make-capture message. This message goes to some worker computer over the net. If there are no workers available at the moment it sends itself the message again 5 times so it will try to send this job in another interval. This looks in a way like some sort of tail call.

After worker creates the capture, a thumbnail and uploads them both to certain FTP address it send back the capture-ready message. This actor upon receiving it sets the data it got and sets that site is processed with this bot with funtions that all other bots at site-assistant use.

Worker

init

.. is an once-actor sends worker-ready message to the Central’s bookman. We have it in a separate function because the other actor here send’s it too.

send-worker-ready: does [ ~send *central* compose [ 'worker-ready "capturer" (my-actor-addr 1) ] ] 

init: make once-actor [ act-once: [ print "sending to central that I am ready" 	send-worker-ready ]	]

capture-worker

capture-worker: make actor [
	act-match: [
		[ 'make-capture url user ] [ print join "got to make a capture:" url
			filename: create-capture url
			either capture-exists? filename [
				either upload-capture filename [
					~send get-actor-addr *central* 2 compose [ 'capture-ready (user) (url) (filename) ]  print "job done."
				] [ print "capture wasn't uploaded" ]
			] [ print "capture wasn't created" ]
			send-worker-ready
		]
	]
]

When capture-worker receives the make-capture message from the central it does it’s thing. It makes the capture, thumbnail, uploads them to FTP and if all that passes it sends to Central the capture-ready. At the end of it’s procedure it also sends worker-ready to message that it’s prepared for another job.

Do it like the the tuple spaces do

My plan at first was that worker will send worker-ready just when it starts, then central will randomly send jobs to available workers. When worker falls out Central will get an error when trying to send the message and remove that worker from it’s group.

But this means several things. Workers need to have a queue where they store jobs because they can get a new job while still working on previous one. It also means that some worker might have 10 jobs waiting in queue while some others workers are waiting doing nothing.

Tuple spaces use a dumb simple model of communication flow and it offers auto-load-balancing by design. I added two lines in code and now we mimic tuple spaces and have the same load-balancing and no queue on worker. If you look at Central capture-director / order-capture you see a line group-remove "capturer" addr. This line removes worker from available workers after it sends it a job. And on the Worker side, as I already mentioned before, after worker does it’s job or hits an error it sends ‘worker-ready again. So with the cost of one message ( worker-ready ) we got this auto-load-balancing.

Video demonstration

Not the most exciting video, but it shows the work-flow of code pieces presented up-there. Left upper window is a SSH shell to the remote server where the Central runs. On the right you see 2 workers.

What now

This is all very early version, far far from robust, more like a learning experience from which we can grow forward. The actor-net library will still need to improve a lot. I am at the crossroad.. there is a chance I could change it all and represent actors with functions instead of objects as now, but I am not certain. If not I intend to put the library inside an object, and loose some of the ugly parts, actor address handling is one of them.

You can download the source code here: download. You won’t be able to run it because it calls into Site Assistant bots code which is not included. You could provide the fake functions.

Reblog this post [with Zemanta]
Read and let read :)
  • del.icio.us
  • Reddit
  • Digg
  • DZone
  • email
  • Facebook
  • HackerNews
  • Twitter
  • StumbleUpon

2 Responses to “REBOL’s actor-net used for real distributed system”

  1. Gregg Irwin Says:

    It just keeps getting better. Good stuff Janko!

  2. janko Says:

    Hi Gregg, thanks .. :) .. and it’s far from finished still.

Leave a Reply