[netcdf-java] Dataset aggregation by globbing

Dear all (esp. John),

We use NcML a lot for both file aggregation and for "fixing" metadata
problems in underlying NetCDF files - it's a great technology.
However, it can be an inconvenience to create an NcML file simply to
aggregate "well-behaved" files.  In our ncWMS we allow users to
specify a group of files using glob expressions, e.g. "/path/to/*.nc"
or even more complex things like "/path/to/200?/*/foo.nc".  This
simply unions the matching files together along the time axis.  It
allows files to contain different combinations of variables.
Internally, the system creates some kind of hash map, so that when a
user requests a particular variable at a particular time, the
aggregation knows which actual file, and which time index within the
file, is being requested.

We have found this to be very useful.  I wonder if it would be a good
idea to integrate this capability into the NetCDF-Java libraries so
that users can open an aggregation by running
NetcdfDataset.openDataset("/path/to/*.nc") or similar?  What do others
think?

Our code is available for stealing, but it might need some work to
satisfy more use cases.  In particular, for a forecast model run
collection (fmrc) our code automatically generates the "best
timeseries" but doesn't allow access to other things like the run
dates.  I could have a go at creating an IOSP, if this is a good way
to begin the integration.

Cheers, Jon

-- 
Dr Jon Blower
Technical Director, Reading e-Science Centre
Environmental Systems Science Centre
University of Reading
Harry Pitt Building, 3 Earley Gate
Reading RG6 6AL. UK
Tel: +44 (0)118 378 5213
Fax: +44 (0)118 378 6413
j.d.blower@xxxxxxxxxxxxx
http://www.nerc-essc.ac.uk/People/Staff/Blower_J.htm


  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: