Jeff,
> However, we have to keep in mind performance ramifications. It still takes
> a long time to move gigabytes of data across a network. This brings up the
> importance of moving the computation to the data, instead of moving the
> data to the computation. For some data sets and many use cases remote
> access to data works very well so things like brokering are tractable.
> However, for *big* data sets (e.g., climate model output) we need to come
> up with richer mechanisms (like the NCO on local data) to bring computation
> to the data.
See Daniel Wang's SWAMP (the Script Workflow Analysis for
MultiProcessing), built on top of NCO:
https://code.google.com/p/swamp/
--Russ