Hello,
I noticed that the way NcML aggregation cache xml files are created has changed in version 4.6.x. In
previous versions, the cache xml file contained lines similar to:
<netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' ncoords='1'
>
<cache varName='ocean_time' >1.191888E8 </cache>
</netcdf>
from the start. With large datasets, this took a while (30 minutes plus and sometimes crashing TDS)
to generate the first time the dataset was accessed, but subsequent accesses were much faster. The
new way more quickly generates the NcML cache without the cached joinExisting values:
<netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' ncoords='1'
>
</netcdf>
and fills in the "<cache varName='ocean_time' >1.191888E8 </cache>" lines as data from the
corresponding file is requested. A side effect, in my case at least, is that even requests for small
amounts of data are relatively slow. Presumably, this will be the case until all ocean_time cache
values are filled in. Once all values were cached, response times dropped significantly: from 15s to
less than 1s in my very limited tests (~1600 files spanning 19,146 time records).
For anyone experiencing the same side effect, you can populate the whole aggregation cache xml file
with the <cache> lines by requesting all records of the joinExisting variable (or successive chunks
for very large datasets) as a workaround.
I can certainly see the reasoning and benefits to the new way of caching but want to point out
possible side effects and workarounds. Another workaround could be to use a combination of
Python/Perl and NCO to generate the cache file (complete with cached joinExisting values) offline.
Dave