Re: [thredds] Aggregation Cache File Naming Generator Change - ID to urlPath

  • To: Michael McDonald <mcdonald@xxxxxxxxxxxxx>
  • Subject: Re: [thredds] Aggregation Cache File Naming Generator Change - ID to urlPath
  • From: Christian Ward-Garrison <cwardgar@xxxxxxxx>
  • Date: Sat, 16 Jan 2016 14:44:30 -0700
Hi Michael,

I've gone ahead and added support for the AggregationCache.cachePathPolicy
element [1]. Just add it to your threddsConfig.xml using a value of
"OneDirectory", e.g.:

<AggregationCache>
  ...
  <cachePathPolicy>OneDirectory</cachePathPolicy/>
</AggregationCache>

This feature will appear in the 4.6.4 release, which we hope to have out in
a week or two.

Cheers,
Christian

[1] https://github.com/Unidata/thredds/pull/372

On Tue, Jan 5, 2016 at 8:49 PM, Christian Ward-Garrison <cwardgar@xxxxxxxx>
wrote:

> Hi Michael,
>
> It looks like this change was made in v4.5.0, and I'm not really sure why.
> The commit [1] just says "Get index file naming correct, so putting indexes
> in cache works". I'll bring this up in our meeting on Thursday.
>
> For the time being, the only way to work around this issue in v4.6 is to
> ensure that none of the urlPaths of your datasets are substrings of any
> others. So, for example, you're going to have trouble
> with "GOMl0.04/expt_32.5", because it is a substring of another urlPath,
> "GOMl0.04/expt_32.5/hrly". Perhaps you could rename the latter to something
> like "GOMl0.04/expt_32.5-hrly"? Not ideal, I know.
>
> Another solution, which would require some new code, is to allow the user
> to specify how the cache files are named in threddsConfig.xml. This is
> actually already possible for GRIB indexes ([2], GribIndex.policy).
> Probably wouldn't be much work to add for aggregations.
>
> With respect to the default nestedDirectory naming policy, it's not clear
> to me how to avoid collisions in a general way. Maybe that's why
> oneDirectory was the default for so long.
>
> Cheers,
> Christian
>
> [1]
> https://github.com/Unidata/thredds/commit/79345f770cf600c774ced0b807ec5eebc37ed9c1
> [2]
> http://www.unidata.ucar.edu/software/thredds/current/tds/reference/ThreddsConfigXMLFile.html#GribIndexWriting
>
> On Mon, Jan 4, 2016 at 9:20 AM, Michael McDonald <mcdonald@xxxxxxxxxxxxx>
> wrote:
>
>> THREDDS Team:
>>
>> Did the XML file naming generator for the aggregation cache files
>> (stored in cache/agg) change/flip from the dataset "ID" value to
>> "urlPath" when going from v4.3.23 to v4.6.x?
>>
>> If so, why was this done, as it is preventing us from upgrading to the
>> latest 4.6.3 due to the urlPath structure we currently use (which
>> nicely mimics our FTP listing) and is important for keeping the same
>> for obvious legacy reasons.
>>
>> e.g., we recently went in a changed all "/" to "-" in our dataset IDs
>> (only) to fix this cache/agg file naming issue on our production
>> v4.3.23 TDS server. What's odd is that there seems to have been a
>> "collision detector" for creating these cache files, as some dirs had
>> files with a "-" replacing the "/" when conflicts occurred - not so in
>> v4.6.x.
>>
>>
>> <dataset ID="GOMl0.04-expt_32.5" urlPath="GOMl0.04/expt_32.5">...
>> <dataset ID="GOMl0.04-expt_32.5-2014"
>> urlPath="GOMl0.04/expt_32.5/2014">...
>> <dataset ID="GOMl0.04-expt_32.5-2014-hrly"
>> urlPath="GOMl0.04/expt_32.5/2014/hrly">...
>>
>> v4.3.23
>> http://tds.hycom.org/thredds (agg cache works fine with no "/" in the
>> dataset IDs - many flat files in the cache/agg with no directories)
>>
>> file naming structure (in v4.3.23) looks to be generated from the dataset
>> "ID"s
>>
>> cache/agg/GOMl0.04-expt_32.5
>> cache/agg/GOMl0.04-expt_32.5-2014
>> cache/agg/GOMl0.04-expt_32.5-2014-hrly
>> cache/agg/GOMl0.04-expt_32.5-2015
>> cache/agg/GOMl0.04-expt_32.5-2015-hrly
>>
>> ::now the problems occur:
>>
>> v4.6.3
>> http://beta.hycom.org/thredds (agg cache seems to be using the dataset
>> "urlPath" for generating the XML filenames in cache/agg and there is
>> no collision avoidance, as we have directories in our cache/agg, even
>> though we changed all IDs to "/" to "-" in the catalogs)
>>
>> cache/agg/GOMl0.04/expt_32.5
>> cache/agg/GOMl0.04/expt_32.5/hrly
>> cache/agg/GOMl0.04/expt_32.5/2014
>> cache/agg/GOMl0.04/expt_32.5/2015
>> cache/agg/GOMl0.04/expt_32.5/2016
>>
>> so we get a "partial caching" of the datasets (i.e., the leaf datasets
>> "GOMl0.04/expt_32.5/2015/hrly" are missing because the server cannot
>> write a cache file due to there already being a *file*
>> "GOMl0.04/expt_32.5/2015" in cache/agg.
>>
>> e.g., errors on our beta.hycom.org/thredds server running the latest
>> v4.6.3 (identical catalogs as our v4.3.23 server)
>>
>> java.io.FileNotFoundException:
>> /var/lib/tomcat/content/thredds/cache/agg/GOMl0.04/expt_32.5 (Is a
>> directory)
>>
>> java.io.FileNotFoundException:
>> /var/lib/tomcat/content/thredds/cache/agg/GOMl0.04/expt_32.5/2015/hrly
>> (Not a directory)
>>
>> java.io.FileNotFoundException:
>> /var/lib/tomcat/content/thredds/cache/agg/GOMl0.04/expt_32.5/2014/hrly
>> (Not a directory)
>>
>>
>> What's the fix for this?
>>
>> --
>> Michael McDonald
>> Florida State University
>>
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: