[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NcML aggregation and NCA



On 3/8/2013 11:02 AM, David Hassell wrote:
Dear John,

I have been thinking about getting cf-python
(http://cfpython.bitbucket.org/) to be able to write out NcML
aggregation files and wondered if you had any advice. Is there any
documentation relating to how properties such as flag_values,
ancillary_variables, etc. are dealt with?

On a related note, I have written up a framework for storing datasets
created by the CF aggregation rules
(https://cf-pcmdi.llnl.gov/trac/ticket/78), both in memory and in a
(netCDF) file. I would be most interested in your opinion of this. The
(very short!) abstract and introduction are at:

http://www.met.reading.ac.uk/~david/nca/0.2.2/build/
http://www.met.reading.ac.uk/~david/nca/0.2.2/build/introduction.html

In particular, in the introduction I mention ways in which I think it
is more general than NcML - but I wonder if what I say is correct ...?

Many thanks, and all the best,

David

--
David Hassell
National Centre for Atmospheric Science (NCAS)
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243,
Reading RG6 6BB, U.K.

Tel   : 0118 3785613
Fax   : 0118 3788316
E-mail: address@hidden

Hi David:

NcML Aggregation is a syntatic aggrgation, with almost no understanding of the meaning of the constructs.

http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html

has a paragraph at the end of each section:

A Union dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. If an object with the same name already exists, it is skipped. You need to pay close attention to dimensions and coordinate variables, which must match exactly across nested files.

A JoinExisting dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. All variables that use the aggregation dimension as their outer dimension are logically concatenated, in the order of the nested datasets. Variables that don't use the aggregation dimension are treated as in a Union dataset, i.e. skipped if one with that name already exists.

A JoinNew dataset is constructed by transferring objects (dimensions, attributes, groups, and variables) from the nested datasets in the order the nested datasets are listed. All variables that are listed as aggregation variables are logically concatenated along the new dimension, in the order of the nested datasets. A coordinate Variable is created for the new dimension. Non-aggregation variables are treated as in a Union dataset, i.e. skipped if one of that name already exists.

"Feature Collections" are intended to be the successor to Aggregation. These are semantically aware collections. Much ongoing work in the CDM. I need to start writing more docs on this, but heres a start:

http://www.unidata.ucar.edu/software/netcdf-java/reference/FeatureDatasets/Overview.html

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/FeatureCollections.html

-----

For sure, your proposed "
CF aggregation" is more general than NcML.  I have found it necessary to work at an object model level, esp one that includes general coordinate systems. Your CF data model :

http://www.met.rdg.ac.uk/~jonathan/CF_metadata/cfdm.html

is roughly equivalent to the CDM data model :

http://www.unidata.ucar.edu/software/netcdf-java/CDM/index.html

However I havent analyzed it in detail.

The CF data model  and your proposal

http://www.met.reading.ac.uk/~david/cf_aggregation_rules.html

are closely tied to the CF encoding. In a way thats good (to get specific about CF meanings) but you also run the risk of getting lost in the details. Your proposal strikes me as midway between the syntactic approach in NcML and the semantic approach in Feature Collections. The syntactic approach is very useful but limited. I think you cant get everything you want with it. Im not sure, for example, if it can handle discrete geometry encodings. They can be a bit devilish, especially when performance and large collections of files are in the mix.

CDM handles many files formats and conventions, so is necessarily more abstract, which sometimes means vague.

So general feedback is

  1) keep separating the data model from the file encoding.
  2) implementing is necessary to see what real world cases get covered and what dont.

Good luck!

John