The data I attached is for a test case in a scenario I am trying to handle.
I have several thousand netcdfs (some CF, some not), most of which are the
same logical dataset broken up via a time or Z axis into datasets
consisting of 30-50 files, which I must aggregate into a single 'logical'
dataset (I believe this is a fairly common use case). These files are
updated daily, but due to the amount of data involved as well as other
environmental factors, these updates happen sporadically over a span of
about 24 hours.
So what I am trying to do here is, as the files of an aggregated dataset
are slowly updated with newer versions of the same file, add those new
versions to the aggregated datasets that they belong to but ensuring that
the new data can be differentiated within the aggregation via its data
creation time (be it a model run time or production time or whatever). This
is where the joining of files with the joinNew dimension comes in (in this
example, 'runtime'), as the data creation time does not exist in the
datasets as a coordinate variable, and in some cases is not even indicated
in global attribution.
Ultimately, once all of the files for an aggregated dataset have been
updated, the aggregation contains files that all have the same data
creation or run time, until the next update starts.
You seem to be indicating that I cannot perform a 'joinNew' aggregation
between datasets that have coordinate variables with different sizes? If
that is the case, and I missed it in the documentation somewhere, then what
about aggregating the files with a joinNew first, and then aggregating
those aggregations as 'joinExisting' along time/Z axis?
There still is the issue, though, of the random behavior (an exception for
some reads, for other reads an array of values) which indicates a
concurrency problem. If the read worked consistently, instead of only half
of the time, that would still be useful to me as my code could easily
determine which values in the returned array were valid.
At any rate, thanks for responding so quickly
On Sat, Nov 14, 2015 at 5:35 PM, John Caron <jcaron1129@xxxxxxxxx> wrote:
> Hi Clifford:
>
> <aggregation type="joinNew" dimName="runtime">
> <netcdf coordValue="0" location="ncom-relo-mayport_u_miw-t000.nc"/>
> <netcdf coordValue="24">
> <aggregation type="joinExisting" dimName="time">
> <netcdf location="ncom-relo-mayport_26_u_miw-t001.nc"/>
> <netcdf location="ncom-relo-mayport_26_u_miw-t000.nc"/>
> </aggregation>
> </netcdf>
>
> ncom-relo-mayport_u_miw-t000.nc only has 1 time coordinate, but the inner
> aggregation has 2, so these are not homogeneous in the sense that Ncml
> aggregation requires.
>
> could you explain more what you are trying to do?
>
> John
>
>
> On Fri, Nov 13, 2015 at 11:24 PM, Clifford Harms <clifford.harms@xxxxxxxxx
> > wrote:
>
>> I've posted the report, sample data, sample xml, and sample code on
>> github -> https://github.com/Unidata/thredds/issues/276
>>
>>
>> --
>> Clifford M. Harms
>>
>> _______________________________________________
>> netcdf-java mailing list
>> netcdf-java@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
--
Clifford M. Harms