Re: [thredds] 4.6.x NcML Aggregation Cache Generation

To: David Robertson <robertson@xxxxxxxxxxxxxxxxxx>
Subject: Re: [thredds] 4.6.x NcML Aggregation Cache Generation
From: Roy Mendelssohn - NOAA Federal <roy.mendelssohn@xxxxxxxx>
Date: Thu, 21 May 2015 09:00:09 -0700

Hi David:

Thanks for figuring this out.  I have to decide whether to upgrade or not.  
This brings up the obvious that for fast virtual aggregation the work must be 
done at some point to find out about the dataset, and it is either done upfront 
or as you go, but it needs to be done. For GRIB aggregations this is what the 
TDM does (though I am still having problems getting that to work correctly) - 
it figures out the aggregation “offline” as it were, and updates that info.

The advantage of the old system is that any request to the dataset would force 
TDS to get the entire aggregation, now only a specific request would

-Roy.  

> On May 21, 2015, at 8:32 AM, David Robertson <robertson@xxxxxxxxxxxxxxxxxx> 
> wrote:
> 
> Hello,
> 
> I noticed that the way NcML aggregation cache xml files are created has 
> changed in version 4.6.x. In previous versions, the cache xml file contained 
> lines similar to:
> 
>  <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' 
> ncoords='1' >
>    <cache varName='ocean_time' >1.191888E8 </cache>
>  </netcdf>
> 
> from the start. With large datasets, this took a while (30 minutes plus and 
> sometimes crashing TDS) to generate the first time the dataset was accessed, 
> but subsequent accesses were much faster. The new way more quickly generates 
> the NcML cache without the cached joinExisting values:
> 
>  <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' 
> ncoords='1' >
>  </netcdf>
> 
> and fills in the "<cache varName='ocean_time' >1.191888E8 </cache>" lines as 
> data from the corresponding file is requested. A side effect, in my case at 
> least, is that even requests for small amounts of data are relatively slow. 
> Presumably, this will be the case until all ocean_time cache values are 
> filled in. Once all values were cached, response times dropped significantly: 
> from 15s to less than 1s in my very limited tests (~1600 files spanning 
> 19,146 time records).
> 
> For anyone experiencing the same side effect, you can populate the whole 
> aggregation cache xml file with the <cache> lines by requesting all records 
> of the joinExisting variable (or successive chunks for very large datasets) 
> as a workaround.
> 
> I can certainly see the reasoning and benefits to the new way of caching but 
> want to point out possible side effects and workarounds. Another workaround 
> could be to use a combination of Python/Perl and NCO to generate the cache 
> file (complete with cached joinExisting values) offline.
> 
> Dave
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 

**********************
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Mendelssohn@xxxxxxxx www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

References:
- [thredds] 4.6.x NcML Aggregation Cache Generation
  - From: David Robertson

2015 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: