Re: NCML

To: Rob Weingruber <weingrub@xxxxxxxxxxxx>
Subject: Re: NCML
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Mon, 31 Jul 2006 10:59:20 -0600

Rob Weingruber wrote:

Hi John Again ;-)

John Caron wrote:
Rob Weingruber wrote:
Hi Again ;-)

Looks like the NCML is the way to go.  Thanks for the suggestion ;-)
I created a super simple NCML that aggregates several files into asinglevirtual dataset, joined by an existing 'Time' variable. Piece ofcake. And theGeoGrid API was nice enough to then give me all of the 'Valid Times'for thatvirtual dataset. This 'Time' variable is the semantic equivalent ofthe 'valid time'for the file. However, there also is the issue of the 'GeneratedTime' for a file(ie: generated at 12:00 Z, but valid for 15:00 Z. This would beused in requestssuch as 'give me the 15:00 Z forecast gen'ed at 12:00 Z). I seethat there mightbe 2 ways to glue on the generated time information: a) as anattribute in eachof the files that make up the data set or b) join on a new gen timevariable. Whichwould be best and performant, in your opinion? Would the lattereven be possible, considering thatwe still would need to join on the existing 'valid time' variable?Or would we justjoin on the 'valid time', and then attach a new gen-time variable(and value) to eachof the files (within the NCML for that virtual dataset)?
I am currently working on a new kind of NcML aggregation called"forecastModelRunCollection", which deals with a 2D time, "valid" and"generated". I hope to have an alpha version in the next week or two.There is some partially completedd code in the 2.2.17 snapshot.I will probably make some UML diagrams, and ill send them along toyou for your feedback when I do.
Gladly will take a look at these/this whenever you're ready for meto. This sounds like
exactly what we might need...
Make sense?
Also, I recall that we agreed the performance would be fine for,say, 10,000 fileswithin a virtual dataset defined by NCML. Did I misinterpret, or isthat reasonable?
I think there will be some optimizations needed to scale up to thatsize. It will probably work (given enough memory - I forget if JVMsare still restricted to 2Gb heaps)? Id like to measure its memoryuse, so perhaps you could help me test and debug this size datasets?
Glad to help here too. I think I have an old JBuilder around, thathas an OptimizeIt license too....
One thing I thought of recently, is: does NCML allow datetimecoordValue's to be placed*into* the NCML (thereby avoiding a file.open when those coordValuesare queried via the API)?I tried the following, to no avail**:
  <aggregation dimName="Time" type="joinExisting">
<netcdflocation="file:/d2/www/data/ncmlTest/DPG/2006070611/wrfout_d01_2006-07-06_080000.DPG_F.nc"coordValue="2006-07-06 08:00:00Z"/>...
The reasoning behind this is that I would like to place the 'valid(and gen) time's into the NCML,where each coordValue would theoretically match the value in the filefor that specificnetcdf file. If the API could then *use the values directly from theNCML*, then theremight be no need to open the file(s) when geoGrid.getTimes() iscalled. The point beingthat if we can avoid opening files for valid and gen time information,then we'd betterthe performance for datasets with lots and lots of files. What do youthink?
** "To no avail" - means that it worked, but I tried moving the netcdffiles out of the way,to see if they would be opened for a geoGrid.getTimes() call, andexceptions were thrown.It all worked when I left the files where they were supposed to be,but that wasnt the point ;-)

Ive just been working on some of that in 2.2.17, see new section"Defining coordinates on a JoinExisting aggregation" in


 http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html.

It looks like what you were doing above is mostly correct (assuming yourfiles have an existing coordinate variable called "Time" with length 1),but the current version is not handling it. I would reccomend that youuse the form "2006-07-06T08:00:00Z" so that we can use space delimiterswhen theres more than one coord value.

Also, the coord values can be cached (you have to enable this, see thelast section "Aggregation Caching") if you want to let the library readthe values the first time.

This code is so new im not sure i have even done a release with it. Imworking at home today, ill check when im in tommorrow...

This refers to the joinExisting aggregations. You probably really wantto use the new "Forecast Model Run" Aggregation that im working on now.It will be similar, but take into account the 2D time coordinate.

2006 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: