Re: How to represent packed variable statistics (e.g., valid_range, mean)?

To: Mark A Ohrenschall <Mark.A.Ohrenschall@xxxxxxxx>
Subject: Re: How to represent packed variable statistics (e.g., valid_range, mean)?
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Wed, 02 Oct 2002 10:50:52 -0600

Hi Mark:

When I dealt with this issue when implementingucar.nc2.VariableStandardized.java ( seehttp://www.unidata.ucar.edu/packages/netcdf-java/javadoc/index.html), I came upwith these rules:


"Implementation rules for missing data with scale/offset

1. _FillValue and missing_value values are always in the units of the external(packed) data.

2. If valid_range is the same type as scale_factor (actually the wider ofscale_factor and add_offset) and this is wider than the external data, then itwill be interpreted as being in the units of the internal (unpacked) data.Otherwise it is in the units of the external (packed) data."

Basically it is necessary to keep the missing data values in the packed form, sothat you can efficiently detect them. Some datasets had valid_range usingunpacked units, so i allowed that, but i think its prefereable to use packed units.

In general, I would say its not necessary to require related (eg statisticalvariables like mean, std.dev) variables to have the same packing, but theycertainly better have consistent unpacked units. The idea is that you are goingto unpack all of your variables, and then start dealing with them. You might beable to pack std.dev in 2 bytes, but need 4 bytes for the variable itself, forexample. After unpacking, it shouldnt matter.

Of course, a convention like CF-1 might decide to require the same packing forrelated variables, although i dont think it currently does.






Mark A Ohrenschall wrote:

Hello,

In the case of a packed variable (in which scale_factor and add_offset
are used) both the COARDS and CF conventions indicate that missing_value
and _FillValue should be likewise packed:

COARDS: "In cases where the data variable is packed via the scale_value
attribute this implies that the missing_value flag is likewise packed."
CF: "The missing values of a variable with scale_factor and/or
add_offset attributes (see section 8.1) are interpreted relative to the
variable's external values, i.e., the values stored in the netCDF file."

I'm assuming that for the sake of consistency, this means that all
statistical variable attributes should be packed as well, e.g.,
valid_range and actual_range, as well as mean and standard_deviation. Is
this true?

So for example, if I have real world data values for temperature between
-1.6 and 31.4 and I'm applying a scale_factor of 0.1 then I would say
the valid_range is -16, 314 and the mean is 116 (not 11.6)?

Thanks,

Mark

References:
- How to represent packed variable statistics (e.g., valid_range, mean)?
  - From: Mark A Ohrenschall

2002 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: