For those of you interested in grid computing I found an older, but great post about scalability of ec2 for grid based applications. The thing that caught my eye was the final test using Windows HPC and Velocity. The tests were not comparable to each other, but the final test shows how much degradation you suffer when you're data is stored away from your computations. In there tests 31x reduction in performance when your data is stored "out of the cloud". I think this really shows the importance for good redundant storage at the point of computation.
The good news is for GridGain is the near linear scalability up to 512 nodes in pure CPU tests. Not as high as 2000 nodes for Hadoop, but that's the only real numbers I've seen anywhere on it. Does hint that GridGain's network overhead is really pretty light.