All,
I’ve been working on some big aggregations that are getting prohibitively
expensive to scan. We’ve gone around in circles trying to get the aggregation
caches to stick, but no matter what we do, it seems that they simply will not
get picked up reliably.
In general, I’ve just made sure the data we publish has a fixed time dimension
so that scanning a whole bunch of files is cheap. With this dataset, rewriting
to a fixed time dimension doesn’t seem to be an option.
I’m wondering if someone knows definitively if writing the time coordinate
variable of a joinExisting aggregation into the .ncml will get picked up in a
way that THREDDS would not have to scan all the files to find the time stamps?
ie.
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value=“days since . . . " />
<values>6 18 etc. . .</values>
</variable>
The dataset I’m talking about is here:
http://esgdata1.nccs.nasa.gov/thredds/catalog/bypass/NEX-DCP30/bcsd/catalog.html
The top level joined and unioned aggregations take 3-5 minutes to respond. If
you go down a level, each of the smaller joinExisting aggregations takes about
2-3 seconds to respond. There’s something like 93 joinExistings, so the time
adds up that it’s scanning all the files to create the big union.
The tests I’ve done haven’t given me the answer I want, but I’m not able to get
ahold of THAT much of this data for testing since it is so spatially massive.
Thanks for any help you can provide.
- Dave