Wrong Notes: February 2009

2/21/2009

Actionscript and Concurrency (III of III)

In the previous article we covered techniques for breaking up our long running job, but the performance was 40x slower than if we just ran our algorithm straight out. The problem is our algorithm spends very little time doing work, and a lot of time waiting for the next frame. Actionscript's performance is really quite high. We need to increase the time spent running our algorithm and minimize the time we spend doing nothing. We can do that by doing many iterations per frame instead of just one. Using getTimer() we measure how much time we spent looping and back off right before the next frame. Let's look at the code:


public function start() : void {
    Application.application.addEventListener( Event.FRAME_ENTER, onCycle );
}

private function onCycle( event : Event ) : void {
   var cycle : Boolean = true;
   var start : Number = getTimer();
   var milliseconds = 1000 / Application.application.stage.frameRate - DELTA;
   while( cycle && (getTimer() - start) < milliseconds ) {
       cycle = doLongWork();
   }

   if( cycle == false ) {
       Application.application.removeEventListener( Event.FRAME_ENTER, doLongWork );
   }
}
    
public function doLongWork() : Boolean {
    // do some work
    i++;
    return i < total;
}

Now we broken up our algorithm into an extra method. First the start() method which we're already seen. The new method is the onCycle which calculates how long a frame is in milliseconds. The loop continues until either the doLongWork method returns false, or we run out of time. Notice the DELTA constant is some constant that keeps us from eating up the entire frame. We need to give a little breathing room for Flash to drain the queue. Notice how our doLongWork method is just the code pertaining to our job. This makes it's easier to build a general purpose solution that we can reuse.

Green Threads

We can't use true OS threads in Actionscript, but any language can emulate threads. This technique is often called Green Threads. Lots of languages have used this in the past. Threads in Ruby are still green, and early versions of Java were green as well. Now Actionscript can too. I should pause and give credit to Drew Cummins who implemented a version of this for Flash player 10. I've rewritten this to remove the dependency of Flash 10, and changed some of the API so event dispatch is more natural, added easy progress events, and optional progress tracking. Let's see how our Mandelbrot algorithm changes when we use this.

In order to use GreenThreads create a subclass of GreenThread, override run method, and optionally override initialize method to add code that runs at the start. Here is an example:


public class Mandelbrot extends GreenThread {
 private var _bitmap : BitmapData;
 private var _maxIteration : uint = 100;
 private var _realMin : Number = -2.0;
 private var _realMax : Number = 1.0;
 private var _imaginaryMin : Number = -1.0;
 private var _imaginaryMax : Number = 1.0;
 private var _shader : Shader;
 
 private var _realStep : Number;
 private var _imaginaryStep : Number;
 private var screenx : int = 0;
 private var screeny : int = 0;

 override protected function initialize( ) : void {
  _bitmap = new BitmapData( width, height, false, 0x020202 );
  screenx = screeny = 0;
  _realStep = (_realMax - _realMin) / Number(_bitmap.width);
  _imaginaryStep = ( _imaginaryMax - _imaginaryMin ) / Number( _bitmap.height );
 }

 override protected function run():Boolean {
  if( screenx > _bitmap.width ) {
   screenx = 0;
   screeny++;
  }
  if( screeny < _bitmap.height ) {
   var x : Number = screenx * _realStep + _realMin;
   var y : Number = screeny * _imaginaryStep + _imaginaryMin;
   var x0 : Number = x;
   var y0 : Number = y;
   var iteration : int = 0;
   while( x * x + y * y <= (2 * 2) && iteration < _maxIteration ) {
    var xtemp : Number = x * x - y * y + x0;
    y = 2 * x * y + y0;
    x = xtemp;
    iteration = iteration + 1;
   }
   
   if( iteration == _maxIteration ) {
    _bitmap.setPixel( screenx, screeny, 0x000000 );
   } else {
    _bitmap.setPixel( screenx, screeny, shader.lookup( Number(iteration) / Number(maxIteration) ) );
   }
   screenx++;
   return true;
  } else {
   return false;
  }
 }
}

The run() method is the body our of loop. The intialize() method is called once after the user calls the start() method. After that run() method is called repeatedly until it returns false. It's perfectly acceptable to call start() more than once to kick off the thread again after it's finished. That means you can calculate the Mandelbrot set from different zoom levels without needing to recreate new instances. The initialize() method will be called every time start() is called. Check out the results here.

You can also add optional progress tracking by setting maximum, and progress members. This will automatically dispatch ProgressEvents so that your instance can be a source to a ProgressBar. It makes tracking your job easy. GreenThread also subclasses EventDispatcher so you can dispatch events from within the run method.

By in large we've solved the performance problems or we've gotten very close. What's holding us back is the resolution of getTimer(). Since we only have precision of millisecond we really can't run the risk of going smaller than 1 millisecond for our DELTA. That costs us a few iterations on our run() method which can make a difference over 1000 cycles. We could be a full second behind Actionscript that just ran the job straight through. There are a few things we can do to squeeze a little more performance out of GreenThreads.

Frame rate governs everything we do, and by default Flex applications run at 24 frames/s, but really most Flex applications don't do that much animation so if we dropped the frame rate in half to 12 frames/s we would be able to run for longer periods uninterrupted. The fewer interruptions we have, the faster we'll go.

GreenThreads also allows you to configure how much of the frame's time you dedicate to running your thread. By default it's set at 0.99 that roughly leaves 1 ms to update the UI. Under some experimentation this has proven to work quite well without creating lots of timeouts, but if you want to tweak it just provide a new value in the start method like so:


public function go() : void {
   start( 0.5 );
}

If the delta is less than 1 then it means a percentage of the length of a frame. If it's >=1 then it means the number of milliseconds to subtract from the length of a frame. Some more thought needs to go into this so that as you run your application on different machines with different CPUs so the pause is appropriate for the CPU. In the future it might need to be dynamically adjusted as the algorithm runs.

Thread Statistics

GreenThreads supports runtime statistics for tracking your job. To turn on thread statistics pass true to the GreenThread constructor. Thread statistics collects total time the job took, number of timeouts, min and max iteration times, average time a single iteration took, how many cycles it took, etc. There is a fair amount of information that can be gathered to help tune your thread. You can access that information by doing the following:


public class SomeJob extends GreenThread {

   public function SomeJob() {
      super( true );   // turn on debug statisitics

      addEventListener( Event.COMPLETE, function( event : Event ) : void {
         trace( statistics.print() );
      });
   }
}

Conclusion

There are some drawbacks to doing concurrency this way. One is algorithm have to be cooperative, and stop processing in the middle to let Flash do its thing. That means your algorithm normally have to be rewritten to conform with this approach. That can be particularly difficult for recursive algorithms. There needs to be more research done into how you might fix this with the callLater() technique. The biggest draw back is that we cannot take advantage of multi-processors. For all the code you write Flash runs on a single OS thread. This is a serious disadvantage for us going forward because as Actionscript developers we cannot access boosts in hardware performance as cores are added.

It's been a lot of information but hopefully you now understand the theory behind concurrency in Actionscript, and you have a new library that helps you optimize your code. You can access the source code here, and download the GreenThread's library here. I look forward to hearing about what sorts of long running jobs you create.

Full source code of the Mandelbrot set is here.

Download GreenThread's library here.

2/19/2009

Actionscript and Concurrency (II of III)

Now that we understand more about how Flash works internally we can begin to talk about strategies to chop up our large running job into smaller pieces. We'll render the Mandelbrot set. It's a time consuming algorithm, and it's fun when your demos make pretty pictures too. Let's look at a simple implementation.


public function calculate( width : int, height : int ) : void {
 _bitmap = new BitmapData( width, height, false, 0x020202 );

 var realStep : Number = (_realMax - _realMin) / Number(_bitmap.width);
 var imaginaryStep : Number = ( _imaginaryMax - _imaginaryMin ) / Number( _bitmap.height );
 
 for( var screeny : int = 0; screeny < _bitmap.height; screeny++ ) {
  for( var screenx : int = 0; screenx < _bitmap.width; screenx++ ) {
   var x : Number = screenx * realStep + _realMin;
   var y : Number = screeny * imaginaryStep + _imaginaryMin;
   var x0 : Number = x;
   var y0 : Number = y;
   var iteration : int = 0;
   while( x * x + y * y <= (2 * 2) && iteration < _maxIteration ) {
    var xtemp : Number = x * x - y * y + x0;
    y = 2 * x * y + y0;
    x = xtemp;
    iteration = iteration + 1;
   }
   
   if( iteration == _maxIteration ) {
    _bitmap.setPixel( screenx, screeny, 0x000000 );
   } else {
    _bitmap.setPixel( screenx, screeny, shader.lookup( Number(iteration) / Number(maxIteration) ) );
   }
  }
 }
 var evt : Event = new Event( Event.COMPLETE );
 dispatchEvent( evt );
}

Click here to see it in action. Notice how there was a pause before it actually drew the Mandelbrot set. Maybe you got the pinwheel of death or not responding. That is what happens when you hold up the Event Queue.

At a high level this algorithm calculates whether or not a pixel at (screenx,screeny) is inside the Mandelbrot set (i.e. stays below 4). If it stays below 4 after maxIterations of the loop then it colors the pixel black. If not it's color is based on how many times the inner while loop ran before going past 4. It's not as important you understand what the algorithm is doing as so much the parts of the algorithm. The parts that make this a long job are the three loops inside.

Breaking Apart Algorithms

In order to split this job up and run across many frames we'll need to break up those top two loops. Before we jump into that let's talk in a little more general terms. Say we want to bust up a general purpose loop something like:


public function doLongWork( arg1 : String ) : void {
   for( var i : int = 0; i < 1000000; i++ ) {
      // do some work
   }
}

callLater() Technique

We could use the UIComponent.callLater() method to trigger our loop that might look like the following:


public function doLongWork( arg1 : String, i : int, total : int ) : void {
   // do work
   if( i < total ) {
      uicomponent.callLater( doLongWork, [ arg1, i + 1, total ] );
   }
}

This is nice. We've removed the for loop and replaced it with what looks like a recursive call, but it's not. Actually what we're doing is doing a single iteration of the loop, and then scheduling the Event Queue to call us back later to do the next iteration of our loop. We do this until i == total, and that point we stop.

Timer Technique

Another way we could restructure our code is to us a timer. Here is another way to do this:


public function start() : void {
   i = 0; total = 1000000;
   var milliseconds = 1000 / Application.application.stage.frameRate;
   _runner = new Timer( milliseconds );
   _runner.addEventListener( TimerEvent.TIMER, doLongWork );
}
    
public function doLongWork() : void {
   // do some work
   i++;
   if( i >= total ) {
      _runner.stop();
      _runner.removeEventListener( TimerEvent.TIMER, doLongWork );
   }
}

A little more code, but this works too. Now we're scheduling a timer to call us at an interval of a single frame. That's the first line where we calculate in milliseconds the length of a single frame. Then we register our doLongWork method to get called back by the timer. We then remove the listener and stop the timer once i reaches total. Notice that in this method we have to move i and total into instance variables which means we have to initialize those in some sort of start method.

FRAME_ENTER Technique

The final option we could use is FRAME_ENTER event. That looks like this:


public function start() : void {
   i = 0; total = 1000000;
   Application.application.addEventListener( Event.FRAME_ENTER, doLongWork );
}
    
public function doLongWork( event : Event ) : void {
   // do some work
   i++;
   if( i >= total ) {
      Application.application.removeEventListener( Event.FRAME_ENTER, doLongWork );
   }
}

This is nice because we don't have to fiddle with math in order to call us back at the frame rate. We just register a listener, when we're done we unregister our listener. Not as clean as callLater, but this works in both Flash and Flex. What's important to remember is that all of these techniques are equivalent. There is no discernible difference in performance between them.

Restructuring Our Demo

So now we can restructure our Mandelbrot algorithm to match one of these patterns. In our case we'll move screenx, screeny, _realStep, _imaginaryStep, and our BitmapData outside of our calculate() method, and put them inside calculateAsync(). Here's our Mandelbrot algorithm restructured:


public function calculateAsync( width : int, height : int ) : void {
 _bitmap = new BitmapData( width, height, false, 0x020202 );
 screenx = screeny = 0;
 _realStep = (_realMax - _realMin) / Number(_bitmap.width);
 _imaginaryStep = ( _imaginaryMax - _imaginaryMin ) / Number( _bitmap.height );
       Application.application.addEventListener( Event.FRAME_ENTER, calculate );
}

private function calculate( event : Event ) : void {
 if( screenx > _bitmap.width ) {
  screenx = 0;
  screeny++;
 }
 if( screeny < _bitmap.height ) {
  var x : Number = screenx * _realStep + _realMin;
  var y : Number = screeny * _imaginaryStep + _imaginaryMin;
  var x0 : Number = x;
  var y0 : Number = y;
  var iteration : int = 0;
  while( x * x + y * y <= (2 * 2) && iteration < _maxIteration ) {
   var xtemp : Number = x * x - y * y + x0;
   y = 2 * x * y + y0;
   x = xtemp;
   iteration = iteration + 1;
  }
  
  if( iteration == _maxIteration ) {
   _bitmap.setPixel( screenx, screeny, 0x000000 );
  } else {
   _bitmap.setPixel( screenx, screeny, shader.lookup( Number(iteration) / Number(maxIteration) ) );
  }
  screenx++;
 } else {
         Application.application.removeEventListener( Event.FRAME_ENTER, calculate );
        }
}

Now if we ran this version of it you'd see it's 40x slower! Why is that? Well we're executing a single iteration every start of a frame (~40ms). If you timed the running of the calculate method. You'd see that it probably takes less than a millisecond to complete a single iteration. So we've just take a single iteration that ran in <1ms and now we're running it in 40ms! Yikes!

Remember when I said Flash is all about animation? Running jobs at the frame rate is great for animation, but horrible for general purpose concurrency. We need a better way. So our final part we'll discuss optimizing our technique and demonstrate a general purpose solution so that you don't have recreate the wheel when you need to run jobs for long periods of time. See you in the next installment.

Actionscript and Concurrency (Part I of III)

This is a three part series from a talk I gave at the Atlanta Flex and Flash User Group meeting in February. Original slides are here.

Part I Intro to the Event Queue
Part II Techniques for Long running Jobs
Part III Increasing Performance and a General Purpose Solution

I can hear it now Actionscript can't do that. And some of you might say what is concurrency? Concurrency goes by a lot of names, but it simply means doing two things at once. More accurately stated doing two things simultaneously. Now I can hear what you might say "But, Flash already does that? I mean I can animate two objects on the screen at once." While that's true, Flash is optimized for animation not for general purpose concurrency. In fact the design of animation is so foundational it governs everything that happens in Flash.

Timeline

If you're a Flash developer I'm sure you're familiar with the timeline. The timeline is divided into a series of frames, and Flash executes those frames at a particular rate one after the other. For flex applications the rate is 24 frames / second. If you do the math that means each frame lasts a little more than 40ms. The timeline is very natural for designers. But, for developers this concept is a little strange because it's hard to understand where my code is executed?

Event Queue

For developers I think a thought experiment helps clarify the timeline. Say you were hired to implement the timeline concept. What data structure would help you in doing this task? The answer is surprisingly simple and bears a lot of resemblance to most UI toolkits out there. At the heart of the Flash platform there exists a queue of events. The Event Queue, as it's known, is where all code is triggered. Every piece of code in Flex and Flash is related back to some event being triggered. So when you move your mouse, click a button, type on the keyboard, set timers they all go onto the queue. Flash then pops off those events from the queue and executes them one by one. When one event is done processing the next event is processed. There's no way any two events can be processed at the same time. It all happens one at a time.

Now in Flash even those frames from the timeline are modeled as events. The difference between them and other events is that frame events must occur at certain points in time. Unlike mouse or keyboard which can wait till the next frame to be processed. Frame events must be processed every ~40ms (or 1000 ms / 24).

That means if one event takes too long to process Flash can't update the screen, process other events, or doing anything. What happens is Flash locks up and you get a pinwheel of death or a Not Responding next to your application. In fact Flash punishes such acts and stops processing mouse, keyboard, or button clicks till it catches up. Why? Well that because if you block the queue from processing it won't update that nice animation, and Flash prioritizes that over anything else.

Rule number one is that any code you write must execute within the time span of a frame. If not you run the risk of having Flash coming down on you. This leads us to the second rule which is Flash is not concurrent. There is no way for Flash to update the UI and run your job at the same time.

Timeslicing

So what do you do if you have a long running job? Well the answer lies in how animation works in Flash. Animating several objects at once requires a series of many smaller steps. Say you want to move a ball from one side of the screen to the other. That means you need to move it a little, redraw, move it some more, redraw, etc until the ball is at the other side of the screen. In a way animation is chopping up a long running job into several smaller jobs that run once per frame. We can do that too by chopping up our job into many smaller jobs, and executing them a little at time until we're done!

So how might you do this? Well there are several tricks you can use, and they all perform the same. It's more a matter of taste in which one you choose, but remember technically there is no real difference between these. Our next part we'll look at each of these techniques and discuss unique problems when doing concurrent actions in Flash.