[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: valid_min, valid_max, scaled, and missing values]




-------- Original Message --------
Subject: valid_min, valid_max, scaled, and missing values
Date: Fri, 23 Feb 2001 14:24:51 -0700
From: Russ Rew <address@hidden>
Organization: UCAR Unidata Program
To: address@hidden

John,

First, the GDT conventions at 

 http://www-pcmdi.llnl.gov/drach/GDT_convention.html 

say:

  In cases where the data variable is packed via the scale_factor and
  add_offset attributes (section 32), the missing_value attribute
  matches the type of and should be compared with the data after
  unpacking.

Whereas the CDC conventions at

 http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml

say

 ... missing_value has the (possibly packed) data value data type.

Here's what Harvey had to say to netcdfgroup about valid_min and
valid_max or valid_range applying to the external packed values rather
than the internal unpacked values:

 http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1174

implying that the missing_value or _FillValue attributes should be in
the units of the packed rather than the unpacked data.

And Harvey said (in
http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1095)

  Yet I have encountered far too many netCDF files which contravene
  Section 8.1 in some way.  For example, we are currently processing
  the NCEP data set from NCAR.  An extract follows. It is obvious that
  a great deal of effort has gone into preparing this data with lots
  of metadata and (standard and non-standard) attributes, etc.  But it
  is also obvious that there cannot be any valid data because the
  valid minimum (87000) is greater than the maximum short (32767)!
  And Section 8.1 states that the type of valid_range should match
  that of the parent variable i.e. should be a short not a float.
  Obviously the values given are unscaled external data values rather
  than internal scaled values.

            short slp(time, lat, lon) ;
                  slp:long_name = "4xDaily Sea Level Pressure" ;
                  slp:valid_range = 87000.f, 115000.f ;
                  slp:actual_range = 92860.f, 111360.f ;
                  slp:units = "Pascals" ;
                  slp:add_offset = 119765.f ;
                  slp:scale_factor = 1.f ;
                  slp:missing_value = 32766s ;
                  slp:precision = 0s ;

  It would be useful to have a utility which checked netCDF files for
  conformance to these conventions.  It could also provide other data
  for checking validity such as counting the number of valid and
  invalid data elements.

  I guess I have to take some of the blame.  I was one of the authors
  of NUGC and I was largely responsible for rewriting Section 8.1 last
  year while I was working at Unidata.  I tried to make it clearer and
  simpler.  In particular, I tried to simplify the relationship
  between valid_range, valid_min, valid_max, _FillValue and
  missing_value.  But it seems that we have failed to make the current
  conventions sufficiently clear and simple.

In

 http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1079

here's what John Sheldon of GFDL had to say about whether the missing
value should be in units of the packed or unpacked data:

  - Section 32: Missing values in a data variable I think that the
    data should be checked against the "missing_value" *before*
    unpacking.  First, I think there is already a pretty strong
    convention that "missing_value" be of the same type as the data.
    Second, some packages simply display the packed values, and they
    wouldn't be able to detect missing values. Third, I've been burned
    and confused often enough by varying machine precision to be quite
    shy of comparing computed values.

    However, handling missing values when unpacking packed data does
    present a real problem!  Imagine a subroutine which unpacks, say,
    SHORT values into a FLOAT array.  This routine will be able to
    reliably detect missing values, but what value is it to put in the
    FLOAT array?  We solve this by storing a global FLOAT attribute
    which specifies this number.  If a file has no such attribute, we
    stuff a default value in it.  In any case, we inform the user of
    what was used.

but Jonathan Gregory replied

    > Section 32: Missing values in a data variable
    > 
    > > I think that the data should be checked against the
"missing_value" *before*
     > unpacking. [JS]
    > 
    > Yes, you may well be correct. Thanks.

    The problem then becomes: what will you put in the array of
    unpacked data if you find a missing value in the packed data?  We
    store a global attribute to hold this value (say, -1.E30).  In the
    absence of this global attribute, we simply stuff in a fill-value,
    which is OK, but you lose the distinction between intentionally
    and unintentionally missing data.  In any case, we tell the
    calling routine what float values we used in both cases.

So there evidently was no consensus on this issue and differing
opinions.  Since we have to pick one, I think I favor having the
missing value be in the packed units.

--Russ