Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified

Lansing,

Thanks, but this has always been a part of our TDS configuration.

-Hoop

On 05/04/12 14:19, William Madry wrote:
> On 05/02/2012 04:28 PM, Hoop wrote:
>> All,
>>
>> This is my latest in a now monthly series of requests for help with
>> doing aggregations with our TDS.  The problem I first reported back
>> on 23 February, wherein aggregations don't notice time steps added
>> to the final file in the time series, is unresolved..  Since I last
>> wrote (4 April), we upgraded to 4.2.10.  There was no effect that we
>> could discern.  Whether we use NcML or FeatureCollection, new time
>> steps in the final file go unnoticed until Tomcat is restarted.
>> Fabulously, if a new file is added without restarting Tomcat, the
>> initial time steps in the new final file are added to the aggregation,
>> leaving a gap where the time steps added to the previous "final" file
>> since the last Tomcat restart should be.  This leads to complaints of
>> the aggregation not being CF-compliant, since it appears to have
>> uneven spacing in time.
>>
>> Interestingly, doing the aggregation in RAMADDA works as we would
>> expect, since it is frequently rebuilding the aggregation.  So, while
>> it is perhaps less efficient than TDS, at least it is reliable.
>>
>> -Hoop
>>
>>> Ethan,
>>>
>>> I've deleted Claude's original post from the cascade below, and neatened up
>>> the Subject: line, which will no doubt screw up threading.  In any case, our
>>> web system administrator tells me that we had such a NetcdfFileCache element
>>> all along, with maxFiles set to 0 (I don't have all of our TDS config files
>>> in front of me, alas).  I also found, in the online documentation at:
>>>
>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#DiskCache
>>>
>>>
>>> the following:
>>>
>>>   <FeatureCollection>
>>>     <dir>/tomcat_home/content/thredds/cache/collection/</dir>
>>>     <maxSize>20 Mb</maxSize>
>>>     <jvmPercent>2</jvmPercent>
>>>   </FeatureCollection>
>>>
>>> Eliminating the<dir>  and<jvmPercent>  elements, and setting maxSize to zero
>>> (and, hopefully putting it in the correct TDS config file.>SIGH<), we
>>> restarted Tomcat.  The results were initially disenheartening, as a timestep
>>> added to the final file an hour before was not included in the
>>> featureCollection aggregation, but was picked up by the NcML aggregation of
>>> the same time series.
>>>
>>> I just checked again (about an hour later), and it's still not part of the
>>> featureCollection aggregation.  So, we still have no solution AFAIK.  Of
>>> course, I don't know which TDS config file our web system administrator
>>> put the maxSize element in.  The web page above says it should have gone in:
>>>
>>> ${tomcat_home}/content/thredds/threddsConfig.xml
>>>
>>> Our web system administrator has gone home for the day, but I've asked him
>>> in e-mail just which config file he put that element in.
>>>
>>> -Hoop
>>>
>>> On 04/04/12 14:51, Hoop wrote:
>>>> Ethan,
>>>>
>>>> Thanks for responding.  I'm dubious that this will be effective.
>>>> Our web system administrator looked around and found a different
>>>> cache directory for collections.  When he cleared this out, the
>>>> new invocation of Tomcat resulted in the missing timesteps finding
>>>> their way into the featureCollection aggregation.  It thus strikes
>>>> me that this is indeed the cache that needs to be cleaned our and/or
>>>> disregarded by the aggregation-making daemon process.
>>>>
>>>> Nonetheless, we'll try it and get back to you.
>>>>
>>>> -Hoop
>>>>
>>>> ---------------------------- Original message 
>>>> -------------------------------
>>>> Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed
>>>>
>>>>      * To: thredds@xxxxxxxxxxxxxxxx
>>>>      * Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf 
>>>> file
>>>> are modifed
>>>>      * From: Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
>>>>      * Date: Wed, 04 Apr 2012 13:55:46 -0600
>>>>
>>>> Hi Hoop,
>>>>
>>>> Try turning off the NetcdfFile caching in your threddsConfig.xml by
>>>> setting NetcdfFileCache/maxFiles to zero:
>>>>
>>>>    <NetcdfFileCache>
>>>>      <maxFiles>0</maxFiles>
>>>>    </NetcdfFileCache>
>>>>
>>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#FileCache
>>>>
>>>>
>>>> This will turn off the NetcdfFile cache globally but not the aggregation
>>>> caches. There may be some performance issues in turning this off but we
>>>> suspect that OS file caching may make it negligible.
>>>>
>>>> Let us know what you see. I'll get back to you on the XML checker stuff
>>>> in another email.
>>>>
>>>> Ethan
>>>>
>>>> On 04/03/12 11:47, Hoop wrote:
>>>>> Ethan,
>>>>>
>>>>> Additional information:  our web system administrator checked the
>>>>> logs, and found that the software daemon that is supposed to check
>>>>> and rebuild the aggregation if need be was indeed running, but
>>>>> finding nothing to do.  Worse, he restarted Tomcat, which, with
>>>>> NcML aggregation would pick up the more recent time steps, did not
>>>>> change things.  The time series still ends 2012/03/28, as it did
>>>>> when I first created the featureCollection version of the
>>>>> aggregation, even though the final file has added five time steps.
>>>>> The NcML version of the aggregation did pick up the new time steps
>>>>> when Tomcat was restarted.
>>>>>
>>>>> Hoping for a detailed response,
>>>>> -Hoop
>>>>>
>>>>> On 04/02/12 11:39, Hoop wrote:
>>>>>> Ethan,
>>>>>>
>>>>>> Well, that got me just where NcML aggregation got me: an aggregation
>>>>>> that does not notice new timesteps added to the latest file.  It also
>>>>>> created two new time-like variables (time_offset and time_run) and
>>>>>> threw away most of the metadata I had for the time variable.  My only
>>>>>> reason for using "Latest" instead letting it default to "Penultimate"
>>>>>> was in the forlorn hope of getting my second value of the attribute
>>>>>> time:actual_range picked up.
>>>>>>
>>>>>> I am still getting the same error messages from the XML checker
>>>>>> that TDS runs on its configuration files.  I wonder if I'm ever
>>>>>> going to hear back about this difference that makes a difference
>>>>>> between the published XSDs and the online-documentation.  Here are
>>>>>> the error messages:
>>>>>>
>>>>>> [2012-03-29T19:16:15GMT]
>>>>>> readCatalog(): full path=/usr/share/tomcat5/content/thredds/catalog.xml;
>>>>>> path=catalog.xml
>>>>>> readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01
>>>>>> *** XML parser error (36:14)= cvc-complex-type.2.4.a: Invalid content
>>>>>> was found starting with element 'filter'. One of
>>>>>> '{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addLatest,
>>>>>>
>>>>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addProxies,
>>>>>>
>>>>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addDatasetSize,
>>>>>>
>>>>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addTimeCoverage}'
>>>>>>
>>>>>> is expected.
>>>>>> *** XML parser error (54:50)= cvc-complex-type.2.4.a: Invalid content
>>>>>> was found starting with element 'update'. One of
>>>>>> '{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":fmrcConfig,
>>>>>>
>>>>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":pointConfig,
>>>>>>
>>>>>> "http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2":netcdf}' is
>>>>>> expected.
>>>>>>
>>>>>> readCatalog(): full
>>>>>> path=/usr/share/tomcat5/content/thredds/enhancedCatalog.xml;
>>>>>> path=enhancedCatalog.xml
>>>>>> readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01
>>>>>>
>>>>>> -Hoop
>>>>>>
>>>>>> ------ original message --------------
>>>>>> Hi Hoop,
>>>>>>
>>>>>> Try adding the following to your featureCollection element
>>>>>>
>>>>>>    <metadata inherited="true">
>>>>>>      <serviceName>all</serviceName>
>>>>>>    </metadata>
>>>>>>
>>>>>> Also, since your most recent dataset is the one that is changing, you
>>>>>> might want to change protoDataset@choice from "Latest" to "Penultimate"
>>>>>> (which is the default, so you could just drop protoDataset all
>>>>>> together). Also, since data files in your dataset don't age off, it
>>>>>> probably isn't too important which dataset is used but probably better
>>>>>> to not use the one that gets updated. The protoDataset is used to
>>>>>> populate the metadata in the feature dataset.
>>>>>>
>>>>>> Since your datasets are a simple timeseries rather than a full-blown
>>>>>> FMRC, you will probably want to add
>>>>>>
>>>>>>    <fmrcConfig datasetTypes="Best"/>
>>>>>>
>>>>>> The fmrcConfig@datasetTypes value tells the featureCollection which
>>>>>> types of FMRC datasets to create. With the value "Best", the forecast
>>>>>> types are left off and only the "Best Time Series" dataset is created.
>>>>>> Not the best dataset name for a simple time series grid (its not just
>>>>>> the best time series, its the only one!) but that's what we have for the
>>>>>> moment. If you want to let people see the underlying files, you could
>>>>>> add "Files" to the fmrcConfig@datasetTypes value.
>>>>>>
>>>>>> I'm including the link to the FeatureCollection tutorial [1] which I
>>>>>> forgot to point out in an earlier email when I gave you the link to the
>>>>>> reference docs [2].
>>>>>>
>>>>>> Hope that helps,
>>>>>>
>>>>>> Ethan
>>>>>>
>>>>>> [1]
>>>>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/tutorial/FeatureCollectionsTutorial.html
>>>>>>
>>>>>>
>>>>>> [2]
>>>>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/collections/FeatureCollections.html
>>>>>>
>>>>>>
>>>>>> On 3/26/2012 11:13 AM, Hoop wrote:
>>>>>>> Ethan,
>>>>>>>
>>>>>>> The catalog is attached.  The filter element is in a datasetScan
>>>>>>> element that we use to generically wrap our NetCDF files, and
>>>>>>> not included within the featureCollection element or any other
>>>>>>> aggregation element.  It is meant to generally apply throughout our
>>>>>>> installation.
>>>>>>>
>>>>>>> Sample files may be obtained from:
>>>>>>>
>>>>>>>     ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2.highres/
>>>>>>> The files for this year are updated on a daily basis, barring
>>>>>>> problems.
>>>>>>>
>>>>>>> Let me know what else I can do to help.
>>>>>>>
>>>>>>> -Hoop
>>>>>>>
>>>>>>> On 03/24/12 23:02, thredds-request@xxxxxxxxxxxxxxxx wrote:
>>>>>>>> Send thredds mailing list submissions to
>>>>>>>>    thredds@xxxxxxxxxxxxxxxx
>>>>>>>>
>>>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>>>    http://mailman.unidata.ucar.edu/mailman/listinfo/thredds
>>>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>>>    thredds-request@xxxxxxxxxxxxxxxx
>>>>>>>>
>>>>>>>> You can reach the person managing the list at
>>>>>>>>    thredds-owner@xxxxxxxxxxxxxxxx
>>>>>>>>
>>>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>>>> than "Re: Contents of thredds digest..."
>>>>>>>>
>>>>>>>> thredds mailing list
>>>>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>>>>> For list information or to unsubscribe,  visit:
>>>>>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>>>>>>
>>>>>>>> Today's Topics:
>>>>>>>>     5. Re: Pb between OpenDap and THREDDS when netcdf file are
>>>>>>>>        modifed (Ethan Davis)
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>> Message: 5
>>>>>>>> Date: Sat, 24 Mar 2012 23:02:53 -0600
>>>>>>>> From: Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
>>>>>>>> To: thredds@xxxxxxxxxxxxxxxx
>>>>>>>> Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf file
>>>>>>>>    are modifed
>>>>>>>> Message-ID:<4F6EA6FD.8080906@xxxxxxxxxxxxxxxx>
>>>>>>>> Content-Type: text/plain; charset=ISO-8859-1
>>>>>>>>
>>>>>>>> Hi Hoop,
>>>>>>>>
>>>>>>>> Can you send us (or point us to) a few sample files and send us your
>>>>>>>> full catalog?
>>>>>>>>
>>>>>>>> Is the filter you mention below part of your featureCollection element?
>>>>>>>>
>>>>>>>> Ethan
>>>>>>>>
>>>>>>>> On 3/9/2012 1:59 PM, Hoop wrote:
>>>>>>>>> Ethan,
>>>>>>>>>
>>>>>>>>> I don't believe John ever responded as you had requested.
>>>>>>>>> I did my best to try "featureCollection", but I got nowhere.
>>>>>>>>> It doesn't help that the XSDs specify required elements
>>>>>>>>> (for "update" and "filter") that are not mentioned in the
>>>>>>>>> online documentation; the validation process that TDS runs
>>>>>>>>> at start-up informed me of those errors.  I have no clue how
>>>>>>>>> to correct them.  Here is the attempt I made:
>>>>>>>>>
>>>>>>>>> <featureCollection name="SST_NOAA_OISST_V2_HighResFC" 
>>>>>>>>> featureType="FMRC"
>>>>>>>>>   harvest="true" path="Datasets/aggro/OISSThires.nc">
>>>>>>>>>   <collection
>>>>>>>>>    spec="/Datasets/noaa.oisst.v2.highres/sst.day.mean.#yyyy#.v2.nc$"
>>>>>>>>>    name="SST_OISST_V2_HighResFC" olderThan="15 min" />
>>>>>>>>>   <protoDataset choice="Latest" change="0 0 7 * * ? *" />
>>>>>>>>>   <update startup="true" rescan="0 0 * * * ? *" />
>>>>>>>>> </featureCollection>
>>>>>>>>>
>>>>>>>>> My use of "filter" is as follows:
>>>>>>>>>
>>>>>>>>>       <filter>
>>>>>>>>>          <include wildcard="*.nc"/>
>>>>>>>>>          <exclude wildcard="*.data"/>
>>>>>>>>>          <exclude wildcard="*.f"/>
>>>>>>>>>          <exclude wildcard="*.gbx"/>
>>>>>>>>>          <exclude wildcard="*.txt"/>
>>>>>>>>>          <exclude wildcard="README"/>
>>>>>>>>>       </filter>
>>>>>>>>>
>>>>>>>>> Someone want to tell me what I did wrong in each case?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -Hoop
>>>>>>>>>
>>>>>>>>>> -------- Original Message --------
>>>>>>>>>> Subject:        Re: [thredds] Pb between OpenDap and THREDDS when
>>>>>>>>>> netcdf file are modifed
>>>>>>>>>> Date:   Thu, 23 Feb 2012 22:03:38 -0700
>>>>>>>>>> From:   Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
>>>>>>>>>> To:     thredds@xxxxxxxxxxxxxxxx
>>>>>>>>>>
>>>>>>>>>> Hi Hoop,
>>>>>>>>>>
>>>>>>>>>> The dynamic dataset handling in the NcML aggregation code was 
>>>>>>>>>> designed
>>>>>>>>>> to deal with the appearance of new datasets more than data being
>>>>>>>>>> appended to existing datasets. The NcML aggregations are also
>>>>>>>>>> limited to
>>>>>>>>>> straight forward aggregations based on homogeneity of dimensions and
>>>>>>>>>> coordinate variables; they don't use any coordinate system or higher
>>>>>>>>>> level feature information that might be available. This makes 
>>>>>>>>>> straight
>>>>>>>>>> NcML aggregation somewhat fragile and hard to generalize to more
>>>>>>>>>> complex
>>>>>>>>>> situations.
>>>>>>>>>>
>>>>>>>>>> FeatureCollections are designed to use the CDMs understanding of
>>>>>>>>>> coordinate systems and feature types to both simplify configuration 
>>>>>>>>>> and
>>>>>>>>>> make aggregations more robust and general.
>>>>>>>>>>
>>>>>>>>>> While the FMRC collection capability was designed for a time series 
>>>>>>>>>> of
>>>>>>>>>> forecast runs, I believe it should handle a simple time series of 
>>>>>>>>>> grids
>>>>>>>>>> as well. (John, can you add more information on this?)
>>>>>>>>>>
>>>>>>>>>> Ethan
>>>>>>>>>>
>>>>>>>>>> On 2/23/2012 3:21 PM, Hoop wrote:
>>>>>>>>>>> Ethan,
>>>>>>>>>>>
>>>>>>>>>>> This reminds me of an issue we are having, with version 4.2.7.
>>>>>>>>>>> Here is the relevant snippet from our config:
>>>>>>>>>>> <dataset name="SST NOAA OISST V2 HighRes" ID="SST_OISST_V2_HighRes"
>>>>>>>>>>>      urlPath="Datasets/aggro/OISSThires.nc" serviceName="odap"
>>>>>>>>>>> dataType="grid">
>>>>>>>>>>>      <netcdf
>>>>>>>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";;>
>>>>>>>>>>>          <aggregation dimName="time" type="joinExisting"
>>>>>>>>>>> recheckEvery="15 min">
>>>>>>>>>>>              <scan
>>>>>>>>>>> location="/Projects/Datasets/noaa.oisst.v2.highres/"
>>>>>>>>>>>                    regExp="sst\.day\.mean\.....\.v2\.nc$"
>>>>>>>>>>> subdirs="false"/>
>>>>>>>>>>>          </aggregation>
>>>>>>>>>>>      </netcdf>
>>>>>>>>>>> </dataset>
>>>>>>>>>>>
>>>>>>>>>>> The behavior we are getting in our time series, which is based on
>>>>>>>>>>> NetCDF files with a year's worth of time steps (or less), is as
>>>>>>>>>>> follows:
>>>>>>>>>>> In between re-boots of Tomcat, new time steps added to the latest 
>>>>>>>>>>> file
>>>>>>>>>>> are not added to the aggregation.  However, if the calendar marches
>>>>>>>>>>> along
>>>>>>>>>>> and a new file for a new year is added to our archive without
>>>>>>>>>>> rebooting
>>>>>>>>>>> Tomcat, the timesteps for the new file are added, without the ones
>>>>>>>>>>> that
>>>>>>>>>>> would complete the previous year, resulting in a discontinuity along
>>>>>>>>>>> the
>>>>>>>>>>> time axis.  And someone somewhere may e-mail us complaining that our
>>>>>>>>>>> OPeNDAP object is not CF-compliant because the time steps aren't
>>>>>>>>>>> all of
>>>>>>>>>>> the same size.  %}
>>>>>>>>>>>
>>>>>>>>>>> I looked at the featureCollection documentation link you gave, but
>>>>>>>>>>> since
>>>>>>>>>>> our data are not forecasts, nor point data, nor in GRIB2 format, 
>>>>>>>>>>> that
>>>>>>>>>>> didn't seem the right fit.  Maybe I'm wrong; I'm severely
>>>>>>>>>>> sleep-deprived
>>>>>>>>>>> right now....
>>>>>>>>>>>
>>>>>>>>>>> We also have some time series in monthly files (to keep the 
>>>>>>>>>>> individual
>>>>>>>>>>> file size under 2 Gbytes).  We have not tried aggregating any of 
>>>>>>>>>>> those
>>>>>>>>>>> time series.  Could be an interesting challenge.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for any help.
>>>>>>>>>>>
>>>>>>>>>>> -Hoop
>>>>>>>>>> _______________________________________________
>>>>>>>>>> thredds mailing list
>>>>>>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>>>>>>> For list information or to unsubscribe,  visit:
>>>>>>>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>>>>>>> _______________________________________________
>>>>>>>>> thredds mailing list
>>>>>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>>>>>> For list information or to unsubscribe,  visit:
>>>>>>>>> http://www.unidata.ucar.edu/mailing_lists/
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
> Good Afternoon Hoop,
> 
> After reading through the e-mail thread, I'm guessing that the issue you are
> having relates to a file in the NetcdfFileCache not staying current with your
> updated final file.  Adding a new final file, rather than updating an existing
> file, doesn't show the same issue because the new file is not cached.  I would
> suggest effectively turning off the NetcdfFileCache by setting the maxFiles
> parameter to zero in your threddsConfig.xml file:
> 
>   <NetcdfFileCache>
>     <maxFiles>0</maxFiles>
>   </NetcdfFileCache>
> 
> That way, the aggregation will not grab a stale file out of the cache.
> 
> Regards,
>   Lansing