-------- Original Message --------
Subject: valid_min, valid_max, scaled, and missing values
Date: Fri, 23 Feb 2001 14:24:51 -0700
From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
Organization: UCAR Unidata Program
To: caron@xxxxxxxx
John,
First, the GDT conventions at
http://www-pcmdi.llnl.gov/drach/GDT_convention.html
say:
In cases where the data variable is packed via the scale_factor and
add_offset attributes (section 32), the missing_value attribute
matches the type of and should be compared with the data after
unpacking.
Whereas the CDC conventions at
http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml
say
... missing_value has the (possibly packed) data value data type.
Here's what Harvey had to say to netcdfgroup about valid_min and
valid_max or valid_range applying to the external packed values rather
than the internal unpacked values:
http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1174
implying that the missing_value or _FillValue attributes should be in
the units of the packed rather than the unpacked data.
And Harvey said (in
http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1095)
Yet I have encountered far too many netCDF files which contravene
Section 8.1 in some way. For example, we are currently processing
the NCEP data set from NCAR. An extract follows. It is obvious that
a great deal of effort has gone into preparing this data with lots
of metadata and (standard and non-standard) attributes, etc. But it
is also obvious that there cannot be any valid data because the
valid minimum (87000) is greater than the maximum short (32767)!
And Section 8.1 states that the type of valid_range should match
that of the parent variable i.e. should be a short not a float.
Obviously the values given are unscaled external data values rather
than internal scaled values.
short slp(time, lat, lon) ;
slp:long_name = "4xDaily Sea Level Pressure" ;
slp:valid_range = 87000.f, 115000.f ;
slp:actual_range = 92860.f, 111360.f ;
slp:units = "Pascals" ;
slp:add_offset = 119765.f ;
slp:scale_factor = 1.f ;
slp:missing_value = 32766s ;
slp:precision = 0s ;
It would be useful to have a utility which checked netCDF files for
conformance to these conventions. It could also provide other data
for checking validity such as counting the number of valid and
invalid data elements.
I guess I have to take some of the blame. I was one of the authors
of NUGC and I was largely responsible for rewriting Section 8.1 last
year while I was working at Unidata. I tried to make it clearer and
simpler. In particular, I tried to simplify the relationship
between valid_range, valid_min, valid_max, _FillValue and
missing_value. But it seems that we have failed to make the current
conventions sufficiently clear and simple.
In
http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1079
here's what John Sheldon of GFDL had to say about whether the missing
value should be in units of the packed or unpacked data:
- Section 32: Missing values in a data variable I think that the
data should be checked against the "missing_value" *before*
unpacking. First, I think there is already a pretty strong
convention that "missing_value" be of the same type as the data.
Second, some packages simply display the packed values, and they
wouldn't be able to detect missing values. Third, I've been burned
and confused often enough by varying machine precision to be quite
shy of comparing computed values.
However, handling missing values when unpacking packed data does
present a real problem! Imagine a subroutine which unpacks, say,
SHORT values into a FLOAT array. This routine will be able to
reliably detect missing values, but what value is it to put in the
FLOAT array? We solve this by storing a global FLOAT attribute
which specifies this number. If a file has no such attribute, we
stuff a default value in it. In any case, we inform the user of
what was used.
but Jonathan Gregory replied
> Section 32: Missing values in a data variable
>
> > I think that the data should be checked against the
"missing_value" *before*
> unpacking. [JS]
>
> Yes, you may well be correct. Thanks.
The problem then becomes: what will you put in the array of
unpacked data if you find a missing value in the packed data? We
store a global attribute to hold this value (say, -1.E30). In the
absence of this global attribute, we simply stuff in a fill-value,
which is OK, but you lose the distinction between intentionally
and unintentionally missing data. In any case, we tell the
calling routine what float values we used in both cases.
So there evidently was no consensus on this issue and differing
opinions. Since we have to pick one, I think I favor having the
missing value be in the packed units.
--Russ