[netcdfgroup] IOAPI compliance, was: netCDF tidy?

summary: my VERDI problem was indeed not with netCDF but with IOAPI,
or at least VERDI's compliance with IOAPI. Fixes and lessons-learned
below.

details:

Tom Roche Fri, Mar 2, 2012 at 10:05 PM
>> I have a [netCDF] file [with] which [both] R (up-to-date, with
>> package=ncdf4) and NCO (also up-to-date) [are happy].

To clarify, I took a source netCDF/IOAPI file, and

* removed data variables other than that in which I was interested
  (with NCO), greatly reducing the file size

* changed the data values in each layer (with R, specifically
  package=ncdf4), since they had been inadvertently summed upstream

* appended 2 layers to the datavar of interest (with NCO). Note the
  layers are not themselves spatial; i.e., they do not correspond to
  altitude (more below).

* wrote new data to the appended layers (with R)

>> However, when I try to open it as a dataset with VERDI

http://www.verdi-tool.org/
(version="1.4 2011-06-01")

>> I get [java.lang.NullPointerException]

Thanks for pointers to

http://cf-pcmdi.llnl.gov/conformance/compliance-checker/

(which I'll try when my group's data gets CF-compliant) and to
`nccopy` (which my files passed). The problem turned out to be

John Caron Fri, 02 Mar 2012 20:56:03 -0700
> Verdi uses the netcdf-java library, which knows that the file is an
> IOAPI file,

i.e., IOAPI-compliant (or, more correctly, "I/O API-compliant"), not
merely NetCDF-compliant.

> its likely that NCO / R has manipulated the file in a way that is
> not compliant to the IOAPI metadata spec.

Unfortunately there is not, AFAICS, a written specification for the
IOAPI metadata. Neither is there a compliance checker for IOAPI; at
least, the m3tool `m3stat`

http://www.baronams.com/products/ioapi/M3STAT.html

is more forgiving that VERDI. But thanks to

* John Caron's suggestion

> theres supposed to be a global attribute "VGLVLS" but its missing.

  (more below)

* repeated `diff -uwB  <( ncdump -h $A ) <( ncdump -h $B )`

I noted the following (in no particular order)

1 Removing datavars seems not to bother VERDI, provided one does not
  remove the IOAPI-specific "meta variable" TFLAG. My source.nc
  contained 29 datavars; after NCO, it contained just 2 (mine and
  TFLAG).

2 NCO becomes quite cross if one's datavars do not have an attribute
  named "_FillValue", so I fixed that.

3 I also changed the value of the global attribute NVARS (29 -> 1).

4 source.nc has a global attribute "VAR-LIST" containing a single
  string such that

* the string contains the names of the datavars, in order
* each datavar name has spaces appended to length=16 (i.e.,
  sprintf("%-16s", name))

  I changed target.nc (containing my changes) such that its VAR-LIST
  contained only the name (appropriately formatted) of my datavar of
  interest.

After that (i.e., removing datavars and changing global attrs), but
before adding layers, I was able to load the resulting target.nc,
despite not having altered datavar=TFLAG, which contains one date-time
pair per datavar (other than itself). I.e., for both source and target
`ncdump -v TFLAG` produces

>         int TFLAG(TSTEP, VAR, DATE-TIME) ;
>                 TFLAG:units = "<YYYYDDD,HHMMSS>" ;
>                 TFLAG:long_name = "TFLAG           " ;
>                 TFLAG:var_desc = "Timestep-valid flags:  (1) YYYYDDD or (2) 
> HHMMSS                                " ;
...
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0,
>   2002365, 0 ;

I then had to append layers. source.nc contained 42 layers
corresponding to 42 types of crops modeled by EPIC

http://epicapex.brc.tamus.edu/

such that each layer of the datavar (dimensions=(TSTEP, LAY, ROW,
COL)) contained an emission from that crop on a particular gridcell
(ROW, COL) at a particular TSTEP. Integrating those emissions required
knowing the proportion of the gridcell covered by that crop at that
TSTEP (obtained from BELD

http://www.epa.gov/ttnchie1/emch/biogenic/

) and then doing the appropriate sum of products. For explanatory and
debugging purposes I added a layer (43) to show the sum of the BELD
proportions on each gridcell, as well as a layer (44) to show the
integration of the emissions (i.e., the total estimated emissions due
to the modeled crops). I appended the layers using NCO (which is, in
my experience, more of a cleaver), then calculated their values using
R (more of a scalpel).

However appending layers (i.e., adding layers 43-44) to my datavar
broke its VERDI-compatibility, until I ...

5 changed the global attr=NLAYS appropriately (42 -> 44)

... which was not, alas, sufficient for VERDI. The fix was to also ...

6 change the global attr=VGLVLS "appropriately." (I suspect that this
  datum, a vector of floats, is intended to record the height of
  vertical layers, not applicable here.) I noted that, in source.nc,

* length(VGLVLS) == |layers|+1 (== 43 in source.nc)
* the first element == 1.f
* all subsequent elements 0 < e < 1

  So I appended two more elements 0 < e < 1 to VGLVLS (with R) ...

... restoring VERDI-compatibility! Note that I have not yet dealt with
TFLAG, though I probably will, just to prevent "getting bit" later on
(using these emissions as input to other IOAPI-using tools).

I also (at some unspecified future TSTEP :-) intend to find a
repository at which to provide IOAPI-specific tools, for the benefit
of the next poor bastard that goes down this road. If you know of such
tools already available (other than

* the m3tools (above). These use the fortran API (about which the
  maintainer is adamant), and are the "officially supported" tools,
  but provide (IMHO) a tiny slice of desired functionality. (I.e., I
  don't see how I could have done what I did above only with m3tools.)
  Unfortunately, working around the API (as I did) is probably
  unacceptable to the m3tools maintainer, and I currently lack the
  fortran chops to write R which "drives" the fortran API (though, One
  Day, I hope to be that good :-)

* the python ioapiTools, which, IIUC, are downlevel

) or have suggestions regarding where to put such tools, please reply
off-thread.

HTH, and thanks again, Tom Roche <Tom_Roche@xxxxxxxxx>



  • 2012 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: