storing all types of grids

Harry Jenter (hjenter@stress.er.usgs.gov)
Wed, 25 Mar 92 16:06:28 EST

gabor@hermes.chpc.utexas.edu writes:
> 
>   ... netCDF said nothing about the rectangularity
> of the grid because it looked obvious and intuitive. That was the
> assumption itself: we did not even mention that the grid can be non-
> rectangular, we just followed the common sense and sticked to the
> rectangular grid. When you say:
> 
> example {
> dimensions:
>         x  = 4;
>         y  = 4;
> variables:
>         float   temp( x, y  );
>         int     x(x);
>         int     y(y);
> data:
>         x     = x1,x2,x3,x4;
>         y     = y1,y2,y3,y4;
>         temp  = ... ;
> }
> 
> Variable 'temp' is automatically considered as given over a
> rectangular grid:
> 
>      |     |        |   |
>      |     |        |   |
> y1---|-----|--------|---|-----------------
>      |     |        |   |
>      |     |        |   |
>      |     |        |   |
> y2---|-----|--------|---|-----------------
>      |     |        |   |
> y3---|-----|--------|---|-----------------
> y5---|-----|--------|---|-----------------
>      x1    x2       x3  x4
> 
> I think this is a stone-hard assumption made in the definition of
> netCDF. 

The grid doesn't have to be rectangular. It must merely be
quadrilateral.  I think that this is an important example.  When/if an
official policy is set forth that assigns meaning to variables and
dimensions that have the same name, I hope multi-dimensional coordinate
variables are included.  (I know the proposed netCDF operators are
supposed to assign meaning to dimensions and variables with the same
name, but I think they take the limited perspective defined above.)

Following Gabor's example CDL above, but using the grid drawn below:

 example {
 dimensions:
         x  = 5;
         y  = 10;
 variables:
         float   temp(y,x);
         float     x(y,x);            <---- Here's the change *********
         float     y(y,x);            <---- Here's the change *********
 data:
         x     = x(1,1), x(1,2), x(1,3), x(1,4), x(1,5),
                 x(2,1), x(2,2), x(2,3), x(2,4), x(2,5),
                                   .
                                   .
                                   .
                 x(10,1), x(10,2), x(10,3), x(10,4), x(10,5);
         y     = y(1,1), y(1,2), y(1,3), y(1,4), y(1,5),
                 y(2,1), y(2,2), y(2,3), y(2,4), y(2,5),
                                   .
                                   .
                                   .
                 y(10,1), y(10,2), y(10,3), y(10,4), y(10,5);
         temp  = ... ;
 }

Y
^
|
|     \----\-----\-----\-----\
|      \----\-----\-----\-----\
|      |----|-----|-----|-----|
|      |----|-----|-----|-----|
|     /----/-----/-----/-----/
|    /----/-----/-----/-----/
|    |----|-----|-----|-----|
|    |----|-----|-----|-----|
|     \----\-----\-----\-----\  
|      \----\-----\-----\-----\ 
|
----------------------------------> X

Each grid must have four sides, but it doesn't have to be a rectangle.
x and y are now 2-d coordinate variables.

Isn't saying "NetCDF assumes rectangular gridded data." like saying
"Linear algebra has a stone-hard assumption that arrays have rows and
columns." or like saying "FORTRAN and C are biased toward rectangular
grids because their arrays have rows and columns.".  Sure, it's true,
but the assumption is deeply rooted in concepts much more fundamental
than the definition of netCDF. Could Unidata have defined netCDF without
this "convention"?

>   ... At a higher conceptual level the
> reconstruction of the grid needs more information. But I strongly
> believe the isssue is not only to reduce the size of the storage but
> to increase its functionality, too. 

Unfortunately, these two goals are usually conflicting.  However, I do
think that adding functionality that accounts for regularly-spaced
grids would not add many restrictive conventions to netCDF and might
add a useful savings in grid storage space.  The issue is not so clear
(at least not in my mind) as to whether adding conventions to netCDF
for storing non-quadrilateral grids would make its implementation so
burdensome that it would be impossible for netCDF-compliant software
to understand all possible cases. I think that, if too much "convention"
is added to netCDF, it could lose its greatest assest over HDF and CDF,
ease-of-use.

>   I just propose we streighten out more than the issue of the
> regularly spaced grids. Why don`t we want the whole pie: 

A bigger piece than just the regularly-spaced data issue is certainly
more useful, but much more difficult to attain.

> the concept
> for storage of grids of different complexity and mathematical
> generality. I could imagine something like this:
> 
> general grids ----> quadrilateral ---> rectangular ---> reqularly spaced 
>                |                                        /  |   \
>                |--> triangular                         /   |    \
> 	         |                                   linear  |   other important
>                |--> other                             logarithmical
> 

This is a nice illustration.  The questions it brings to my mind are:

1. What specific pieces of information are required for each branch?

   We need concrete examples from people who use these types of grids.

2. How much do all the branches have in common? 

   Once enough examples are collected from people answering number 1,
   we may start to see a pattern emerging. I hope so.

I propose we start posting some grid descriptions along with
descriptions of the way we store the grids in netCDF files at present.
-----------
Harry L. Jenter                        hjenter@stress.er.usgs.gov
U.S. Geological Survey                 COM: (703) 648-5916 FTS: 959-5916
Mailstop 430, National Center          "Sometimes you're the bug.
Reston, Virginia 22092                   Sometimes you're the windshield."