Re: [thredds] cache options for large aggregated datasets

Tiago,
We've notice this same problem with a couple of our datasets.  The
problem lies in the fact that you are doing a union of several
joinExisting aggregations.  When you run the aggregation you will get an
aggregation file for the union, but if you look at it, the variables
will match the last joinExisting aggregation in that union (it may look
like $ncml_file#null).  What we have done to fix this is create separate
ncml files for each joinExisting aggregation, and a single ncml file for
the union.  If possible run the individual aggregations on their own to
generate the cache file, otherwise have the union run and copy the cache
file to match what it should be for the joinExistings and alter the
contents to match that aggregation.

This is not an ideal solution, so I welcome other suggestions that solve
it better.  But this is a solution that will bridge the gap until this
type of aggregation is better supported.

--
Jordan Walker
Center for Integrated Data Analytics
U.S. Geological Survey
8505 Research Way
Middleton, WI  53562
jiwalker@xxxxxxxx
http://cida.usgs.gov <http://cida.usgs.gov/>


On 04/25/2011 06:18 PM, tnb@xxxxxxxxxxxxxxxx wrote:
>
>> Send thredds mailing list submissions to
>>     thredds@xxxxxxxxxxxxxxxx
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>     http://mailman.unidata.ucar.edu/mailman/listinfo/thredds
>> or, via email, send a message with subject or body 'help' to
>>     thredds-request@xxxxxxxxxxxxxxxx
>>
>> You can reach the person managing the list at
>>     thredds-owner@xxxxxxxxxxxxxxxx
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of thredds digest..."
>>
>>
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>> Today's Topics:
>>
>>    1. Re: cache options for large aggregated datasets (John Caron)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 22 Apr 2011 13:30:31 -0600
>> From: John Caron <caron@xxxxxxxxxxxxxxxx>
>> To: thredds@xxxxxxxxxxxxxxxx
>> Subject: Re: [thredds] cache options for large aggregated datasets
>> Message-ID: <4DB1D757.80700@xxxxxxxxxxxxxxxx>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> On 4/22/2011 11:21 AM, tnb@xxxxxxxxxxxxxxxx wrote:
>>> Hi everybody!
>>>
>>> I have installed Tomcat6 and Thredds 4.2, and everything is working
>>> fine, i just have some questions about performance on the access of
>>> large aggregated dataset.
>>>
>>> I am serving some aggregated data (about 38 Gb) and when i try to
>>> acess the Dataset Access Form from the catalog, thredds spend too much
>>> time to show me the page.
>>
>> can you send the aggregation element?
>
> hi john!
>
> I will send a little part of it (because it 's too many lines), just
> two variables, but the aggregate have a lot more.
>
>     <dataset name="NCEP II - test" ID="ncep2-test"
> urlPath="reanalise/ncep2.nc">
>     <serviceName>odap</serviceName>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>     <aggregation type="union">
>       <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>       <aggregation dimName="time" type="joinExisting">
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1979.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1980.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1981.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1982.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1983.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1984.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1985.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1986.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1987.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1988.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1989.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1990.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1991.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1992.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1993.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1994.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1995.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1996.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1997.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1998.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.1999.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2000.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2001.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2002.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2003.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2004.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2005.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2006.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2007.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2008.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/air.2009.nc"/>
>     </aggregation>
>     </netcdf>
>       <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>       <aggregation dimName="time" type="joinExisting">
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1979.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1980.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1981.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1982.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1983.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1984.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1985.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1986.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1987.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1988.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1989.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1990.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1991.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1992.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1993.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1994.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1995.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1996.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1997.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1998.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.1999.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2000.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2001.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2002.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2003.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2004.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2005.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2006.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2007.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2008.nc"/>
>       <netcdf location="/data/REANALISE/NCEP/NCEP-II/hgt.2009.nc"/>
>     </aggregation>
>     </netcdf>
>      .
>      .
>      .
>
>
>
>>>
>>> I try enable all the cache options (netcdffile cache, aggregation
>>> cache and netcdfdataset cache), but it still take a lot of time (about
>>> 3 minutes) to open the Data Acess Form and consequentily to acess the
>>> data.
>>
>> does that happen only the first time? what about the second time ?
>
> Once i open the Data Acess Form, the second time is fast, but after
> some time (some hours), if i try to acess again the same dataset, it
> get longer time to open, same as the first time.
>
>
>>
>>>
>>> Is this behavior normal with this size of aggregate dataset?
>>>
>>> This is my cache options in threddsConfig.xml
>>>
>>> ...
>>> <AggregationCache>
>>> <scour>-1 hours</scour>
>>
>> why -1 ?
>
> i put -1 because my aggregations never change, so i dont want the
> cachefiles got deleted.
>
> in thredds page
> (http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html)
> "Every scour amount of time, any item that hasnt been changed since
> maxAge time will be deleted. Set scour to -1 to not scour if you have
> aggregations that never change. Otherwise, make maxAge longer than the
> longest time between changes. Basically, you dont want to remove
> active aggregations."
>
>
> the cache filenames are ok? the end part #null dont mean nothing?
>
>
>>
>>> <maxAge>30 days</maxAge>
>>> </AggregationCache>
>>>
>>> <NetcdfFileCache>
>>> <minFiles>200</minFiles>
>>> <maxFiles>400</maxFiles>
>>> <scour>30 min</scour>
>>> </NetcdfFileCache>
>>>
>>> <NetcdfDatasetCache>
>>> <minFiles>100</minFiles>
>>> <maxFiles>200</maxFiles>
>>> <scour>30 min</scour>
>>> </NetcdfDatasetCache>
>>> ...
>>>
>>> Did i forgeted some aditional setup option?
>>>
>>> Also i have noted that in the directory
>>> $TOMCAT/content/thredds/cache/agg, the filename of the caches is
>>> ending with #null,
>>> (e.g. reanalise-ncep1.nc#null) is there something wrong during the
>>> creation of the cache?
>>>
>>>
>>> Thanks for attention!
>>>
>>>
>>> Tiago Bomventi
>>>
>>>
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
>>
>> End of thredds Digest, Vol 27, Issue 29
>> ***************************************
>>
>
>
>
> thanks again!
>
> Tiago Bomventi
>
>
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/

  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: