Re: How to represent packed variable statistics (e.g., valid_range, mean)?

Hi Mark:

When I dealt with this issue when implementing ucar.nc2.VariableStandardized.java ( see http://www.unidata.ucar.edu/packages/netcdf-java/javadoc/index.html), I came up with these rules:

"Implementation rules for missing data with scale/offset

1. _FillValue and missing_value values are always in the units of the external (packed) data.

2. If valid_range is the same type as scale_factor (actually the wider of scale_factor and add_offset) and this is wider than the external data, then it will be interpreted as being in the units of the internal (unpacked) data. Otherwise it is in the units of the external (packed) data."

Basically it is necessary to keep the missing data values in the packed form, so that you can efficiently detect them. Some datasets had valid_range using unpacked units, so i allowed that, but i think its prefereable to use packed units.

In general, I would say its not necessary to require related (eg statistical variables like mean, std.dev) variables to have the same packing, but they certainly better have consistent unpacked units. The idea is that you are going to unpack all of your variables, and then start dealing with them. You might be able to pack std.dev in 2 bytes, but need 4 bytes for the variable itself, for example. After unpacking, it shouldnt matter.

Of course, a convention like CF-1 might decide to require the same packing for related variables, although i dont think it currently does.





Mark A Ohrenschall wrote:

Hello,

In the case of a packed variable (in which scale_factor and add_offset
are used) both the COARDS and CF conventions indicate that missing_value
and _FillValue should be likewise packed:

COARDS: "In cases where the data variable is packed via the scale_value
attribute this implies that the missing_value flag is likewise packed."
CF: "The missing values of a variable with scale_factor and/or
add_offset attributes (see section 8.1) are interpreted relative to the
variable's external values, i.e., the values stored in the netCDF file."

I'm assuming that for the sake of consistency, this means that all
statistical variable attributes should be packed as well, e.g.,
valid_range and actual_range, as well as mean and standard_deviation. Is
this true?

So for example, if I have real world data values for temperature between
-1.6 and 31.4 and I'm applying a scale_factor of 0.1 then I would say
the valid_range is -16, 314 and the mean is 116 (not 11.6)?

Thanks,

Mark





  • 2002 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: