[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #ARH-205811]: Aggregations, dateFormatMark, and filtering/grouping



Hi Greg,

Glad you found a solution. Sorry it wasn't clear in the documentation. There 
are a number of methods for constructing aggregations and a few different 
documents describing those methods. The best one for getting all the detail is 
the "NcML Annotated Schema" document and especially the "aggregation Element" 
section.

http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/AnnotatedSchema4.html#aggregation

We'll take a look at the FMRC and general aggregation documents and see if we 
can clear things up.

Thanks,

Ethan

On 8/19/2010 8:02 AM, Williams, Greg wrote:
> New Client Reply: Aggregations, dateFormatMark, and filtering/grouping
> 
>  
> Hmm... I seem to have found a solution in the form of the 'regExp'
> element.
> (described in the DTD, but not in clear sight on the aggregations page).
> Thanks.
> 
> ________________________________
> 
> From: Williams, Greg 
> Sent: 19 August 2010 12:22
> To: 'address@hidden'
> Subject: Aggregations, dateFormatMark, and filtering/grouping
> 
> 
>  
> Hi,
>  
> Running the latest stable TDS 4.1.7, I have a problem correctly
> aggregating model runs based on dateFormatMark.
> I've searched the online docs/lists and can't see an answer to this, so
> I'm hoping you can help...
>  
> My setup is as follows:
> 1.  An ftp site exists, where model runs are uploaded to 'dated'
> directories under a top-level 'grib1' directory.
>     (eg. ./grib1/upd20100817, ./grib1/upd20100818, ./grib1/upd20100818,
> etc)
>  
> 2.  Each dated directory contains several model datasets, each with a
> prefix (per model area) and a reference time.
>     (eg. sca.2010081800.000.grb, as yyyyMMddHH as the model
> run/reference time and multiple timesteps per file)
>  
> 3.  Examples of prefixes for model areas are 'sca', 'gof', 'grand',
> 'global', 'fint' (there are 20 or so at the moment).
>  
>  
> What I want is to aggregate all the 'sca' runs into a FMRC, all the
> 'gof' runs into a separate FMRC, etc, etc.  Those sections of my catalog
> are included below for sca, gof, and grand data.
>  
> The problem seems to be that the 'dateFormatMark' option just counts
> characters before the # mark and does not perform a character match.
> Could that be true?
>  
> One effect (in this example) is that I think it's trying to aggregate
> all the 3-character-prefixed sets together (ie. sca and gof') and that
> doesn't work due to a clash of reference-times.
>  
> Another effect is that the TDS logs show errors from date-matching
> against other prefixes.
> Eg. Attempts to aggregate the 'sca' set encounter the 'grand.*' files
> and cause:
>  
>     java.lang.RuntimeException: SimpleDateFormat bad = yyyyMMddHH
> Unparseable date: "nd.201008"
>  
>  
> I have no control over the directory structures or model prefixes, and
> cannot partition the files into separate directories by prefix (for
> example).
> I've tried using a 'filter' section in the catalog (straight after the
> end of the 'metadata' section), but the aggregation 'scan' seems
> unaffected and still encounters/includes files with the same
> prefix-length or different prefix lengths (and fails).  
>  
> Is there a way to make dateFormatMark do proper matching, or another
> solution to this?
>  
> Thanks.
> Greg.
>  
>  
>  
> ---
>  
>   <datasetFmrc name="sca" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/sca">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="sca.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
> 
>   <datasetFmrc name="gof" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/gof">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="gof.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
>  
>   <datasetFmrc name="grand" collectionType="ForecastModelRuns"
> harvest="true" path="fmrc/grand">
>     <metadata inherited="true">
>       <serviceName>all</serviceName>
>       <dataType>Grid</dataType>
>       <dataFormat>GRIB-1</dataFormat>
>     </metadata>
>     <netcdf
> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2
> <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> "
> enhance="true">
>       <aggregation dimName="run" type="forecastModelRunCollection"
> timeUnitsChange="true" recheckEvery="15 min">
>         <scan location="/export/ftp/pub/model/grib1/" suffix=".grb"
> dateFormatMark="grand.#yyyyMMddHH" olderThan="1 min" />
>       </aggregation>
>     </netcdf>
>   </datasetFmrc>
>  
> ---

Ticket Details
===================
Ticket ID: ARH-205811
Department: Support THREDDS
Priority: Normal
Status: Closed