Hello all, comments are in-line:
Ian Barrodale wrote:
Hi Ted, John, Russ, and John:
Thank you all for taking the time yesterday to both listen to our story
and to further enlighten us about your work. It was much appreciated.
The note below provides a possible implementation route, and some
questions. Please feel free to point out any shortcomings in our
proposed approach, and please provide any answers that come to mind
regarding our questions.
Thanks again,
Ian
======================
Goal
-------
Based on feedback from BCS Grid DataBlade customers and, in particular,
Ted Habermann, we feel that there may be some value in providing
alternate ways of accessing data from a Grid DataBlade (GRD) - powered
database through existing widely-used protocols and methods. Note that
by "accessing", we really mean just the reading part, as we already
provide, through the BCS Gridded Data Loader client, a means of
conveniently ingesting data from many forms into a GRD-powered
database. One method of accessing the data would be to cast it in the
form of the Common Data Model (CDM) supported by the Java netCDF API
from UCAR. The advantage of this is that:
* users would be able to write software using the Java netCDF API
(which is fairly straightforward to use and well documented) for
accessing GRD data, and
* data providers can use a GRD-powered database and provide access
to it through OPeNDAP, WCS, netCDF files, etc. using the Java
netCDF API (see page 53 attachment, modified from the slide on
page 53 of
http://www.unidata.ucar.edu/staff/caron/presentations/CDM.ppt).
Our understanding of a possible implementation
---------------------------------------------------------------------
To handle GRD data from the Java netCDF API, we would have to:
(i) Create a GRD I/O service provider for the Java netCDF API (see page
38 attachment) that can communicate with the GRD database using a
combination of JDBC and the existing Java GRD API. The Java netCDF API
uses a service provider architecture to handle reading multiple
different file formats and casting them in the form of the CDM.
(ii) Create a GRD content manager to handle the georeferencing
information in the GRD.
One possible method for allowing users to access GRD data without a
full THREDDS catalog is to supply some type of unique URL to the database:
grd://user:pass@server/database
and the service provider would construct a CDM instance that contains a
main group of all the grids in the database and allow the user to
access those grids through the API.
For example:
grd://peter:test123@xxxxxxxxxxxxxxxxxx/coastwatch
might be a reference to a GRD database running at Barrodale that
contains gridded NOAA CoastWatch satellite-derived data for some number
of geographic areas and time periods. The resulting netCDF dataset
would be one that contains a list of grids under a root group like a
directory structure:
/
/sst/
/sst/northeast/
/sst/northeast/jan01_2007 <---- a grid
/sst/northeast/jan02_2007 <---- another grid
...
/chlorophyll/northeast/jan01_2007 <---- a third grid
/chlorophyll/northeast/jan02_2007 <---- and so on
It depends on the desired complexity of the grids in the database as to
whether the user would require a more sophisticated catalog with
querying ability such as that which THREDDS could supply.
see the last answer below.
BTW, the TDS will soon have the ability to do proper HTTP-based authentication,
and we are hoping to make that a standard in OPenDAP clients, which can act
like browsers and pop up a username/password dialog window, instead of
embedding the user:pass@ in the URL.
Questions
---------------
We have the following questions:
1) Where in the netCDF API would the content manager that handles GRD
georeferencing information sit?
2) How does the I/O SP architecture determine the I/O SP for a given
file:// <file://\> style URL? How would it know to handle a grd:// URL
differently?
Very perceptive question; let me start here to explain these 2 questions:
The IOSP architecture is, in fact (RandomAccessFile) file based. Since you will
be URL based, we have to fit you in at a higher level, namely
NetcdfDataset.openFile(). If you look there you will see that we look for
opendap (http: or dods:) and thredds: URLs. It might makes sense to generalize
this to allow plugging in external handlers for your protocol, similar to how
java.net.ContentHandler works. Otherwise we might put your code in the core,
which is also a possibility.
Anyway, NetcdfDataset.openFile() would detect your URL scheme and call
NetcdfFile with your IOSP. We will have to add a new constructor for that. (You
could alternately just subclass NetcdfFile, which is what DODSNetcdfFile does).
As for the "content manager that handles GRD georeferencing information". It could be a
CoordSysBuilder subclass. However, this is actually unnecessary if you use an existing Convention, and we
would highly recommend using the CF Convention for gridded data. Since you are creating the "file",
you can add the attributes and variables needed by that Convention. This makes your data "CF
compliant" automatically, which is a real win.
3) Have we interpreted the slide on page 53 correctly -- is there a
server that can serve out data using the CDM (via the Java netCDF API)
as an intermediate step?
yes, the THREDDS Data Server
4) Does a group structure to represent GRD contents map to an OPeNDAP
connection, WCS, or netCDF file or do those types of data
representations only have netCDF variables and no groups?
In principle you could use Groups, but they really wont be fully supported
until we get the netcdf-4 file format finished and tested. I would advise to
start with the simpler case of no groups.
5) Our understanding of the netCDF Java library is that it has, in
particular, the following two entry points:
* NetcdfFile : this is the bare netCDF access to files of various
types. It doesn't understand anything about coordinate systems.
You can add an I/O service provider to handle your favorite file
format via a class method. The variables it returns are instances
of Variable (which of course don't know anything about coordinate
systems).
* NetcdfDataset : this is a layer built above the NetcdfFile layer
and is the usual interface for applications (e.g., a WCS). It
handles converting various attributes into a coordinate system. It
has a number of methods relating to adding or getting coordinate
systems. These methods seem to be applied to the entire file,
rather than to individual variables (or groups).
coordinate systems are really variable-specific. however the common case is
that each dataset has a single coordinate system (or a set of closely related
ones).
CoordinateSystem
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/CoordinateSystem.html>
*findCoordinateSystem*
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#findCoordinateSystem%28java.lang.String%29>(
java.lang.String name)
// Retrieve the CoordinateSystem with the specified name.
java.util.List *getCoordinateAxes*
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateAxes%28%29>()
// Get the list of all CoordinateAxis objects used by this
dataset.
java.util.List * getCoordinateTransforms *
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateTransforms%28%29>
()
// Get the list of all CoordinateTransform objects used by
this dataset.
boolean * getCoordSysWereAdded *
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordSysWereAdded%28%29>
()
// Has Coordinate System metadata been added.
The NetcdfDataset object contains instances of VariableDS. They are like
a wrapper for the Variable objects found in the NetcdfFile object. There
is a method to ask a VariableDS for the list of coordinate systems
associated with it.
exactly
If we interpret things correctly , when a NetcdfDataset object is built
from a NetcdfFile object, the NetcdfDataset object is responsible for
figuring out the coordinate system information from attributes in the
NetcdfFile, and composing a VariableDS from the coordinate system
information and each Variable. In theory, by implementing our own
CoordSysBuilder class and registering it, we should be able to add
coordinate system information to each VariableDS individually.
yes, or as i mentioned use an existing Convention and CoordSysBuilder.
A question then is : do applications like the web coverage server and
OPeNDAP server get their coordinate information from VariableDS objects
or from the NetcdfDataset object?
OPenDAP is (more or less) at the same level as NetcdfFile, and so just
faithfully transmits Variables, Attributes, and Dimensions across the wire. The
coordinate systems then are added by clients (like CDM) that understand the
convention. We are expecting that DAP4, the future opendap protocol, will add
Groups.
WCS, OTOH, works at the coordinate system level, and so uses the GridDatatype, which is specialized for "coverage" data, and gets its coordinates systems from NetcdfDataset. The clent makes requests in coordinate space, and we know how to translate that into index space. Currently we can send back either geoTiff or netcdf/CF files. There are some limittions- the grid spacing must be uniform in WCS 1.0. We expect to move to WCS 1.1 later this year, which removes that limitation. We havent implemented reprojection/resampling, and im not sure that we will.
If it is from the NetcdfDataset
object, then the strategy of grouping all the grids in a database into a
single NetcdfDataset, as outline above, won't work, and we'd be obliged
to use a THREDDS server. Is this correct?
It would likely be a mistake to put a lot of disparate data into the same
NetcdfDataset. Better to find the right granularity, which is typically
homogenous data that shares the same discovery metadata. So I would not use
the Group mechanism to break the data into granules, better to make seperate
datasets. Its possible that such an idiom will develop with Netcdf-4, but
better to get something working that stays within existing practice, then
decide if you want to forge ahead. Let me emphasize that its really important
to find the right dataset granularity.
This means you want to use THREDDS catalogs to publish the dataset URLs and
associated metadata, and possibly use TDS to serve your data. Once you had an
IOSP or equivilent for your data, the main work is to develop the catalogs.
These can be pretty minimal, but automatically populating catalogs with
high-quality metadata is a huge win in the long run.
I think that would be a powerful value-added product, but of course i dont know
what your customers really want. As Ted mentioned, its a good time to help
influence TDS strategy, and it appears to me that your small company with
extensive scientific experience would be a good fit with Unidata.
John