Hi David:
This is a timely email, Im just now scratching my head trying to understand
why things are different in 4.6. I have a report that possibly the cache is
not getting used on subsequent reads, but my tests have not reproduced
that. So this is helpful.
Note that to get everything to cache, you just need to make a request for
the aggregation coordinate (usually time) values. Could do it with a dods
request, or open the file as a Grid (eg WMS, WCS, NCSS, rom ToolsUI, IDV,
etc) which will automatically request coordinates. A script to do so is
easy enough, using wget or python or whatever you like. Email support if
you need an example.
One might ask why doesnt 4.6 used the previous cached values? It does, but
a change to the default behavior of DiskCache2 may be affecting this. The
4.3 default was to put all cache files into a single directory, but 4.6
default makes nested directories, because having thousands of files in a
single directory is Considered Harmful. If you need to, you can control
that behavior in threddsConfig.xml, but better is to pay the price and redo
the cache with the default. Email support if you need more details.
BTW, it might be advisable to take the opportunity to clear out your
caches, if you are installing on top of your old TDS. Just go to your cache
directory (default is content/thredds/cache), and delete the entire
directory, or if you have the inclination, go and selectively delete stuff
(but then you have to think hard). Then trigger a repopulation as above.
On Thu, May 21, 2015 at 9:32 AM, David Robertson <
robertson@xxxxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> I noticed that the way NcML aggregation cache xml files are created has
> changed in version 4.6.x. In previous versions, the cache xml file
> contained lines similar to:
>
> <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc'
> ncoords='1' >
> <cache varName='ocean_time' >1.191888E8 </cache>
> </netcdf>
>
> from the start. With large datasets, this took a while (30 minutes plus
> and sometimes crashing TDS) to generate the first time the dataset was
> accessed, but subsequent accesses were much faster. The new way more
> quickly generates the NcML cache without the cached joinExisting values:
>
> <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc'
> ncoords='1' >
> </netcdf>
>
> and fills in the "<cache varName='ocean_time' >1.191888E8 </cache>" lines
> as data from the corresponding file is requested. A side effect, in my case
> at least, is that even requests for small amounts of data are relatively
> slow. Presumably, this will be the case until all ocean_time cache values
> are filled in. Once all values were cached, response times dropped
> significantly: from 15s to less than 1s in my very limited tests (~1600
> files spanning 19,146 time records).
>
> For anyone experiencing the same side effect, you can populate the whole
> aggregation cache xml file with the <cache> lines by requesting all records
> of the joinExisting variable (or successive chunks for very large datasets)
> as a workaround.
>
> I can certainly see the reasoning and benefits to the new way of caching
> but want to point out possible side effects and workarounds. Another
> workaround could be to use a combination of Python/Perl and NCO to generate
> the cache file (complete with cached joinExisting values) offline.
>
> Dave
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>