[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf conventions for EOL profiler data



Hi Gary, et al, a few comments on the cdl's, and more inline

Generally these look quite good.

in consensus.cdl:

1. Will there always be 50 heights for every time? That is, is this really a 
max, and do you expect a lot of missing values?


2. use of valid_range:

        float latitude ;
                latitude:long_name = "Site Latitude" ;
                latitude:standard_name = "latitude" ;
                latitude:units = "degrees_north" ;
                latitude:_CoordinateAxisType = "Lat" ;
                latitude:axis = "Y" ;
                latitude:valid_range = -90.f, 90.f ;


A minor point, but presumably latitude will always have a valid value. IMO, 
valid_range should be used only when there is a possibility that the value 
might be out of range, and should be considered missing. It matters ( not 
really here, but for larger data variables) because software has to spend time 
checking if its out of range. For example:

        float w_classicL_conf(time, height) ;
                w_classicL_conf:long_name = "W wind (classic/linear) 
confidence" ;
                w_classicL_conf:valid_range = 0.f, 1.f ;
                w_classicL_conf:_FillValue = -9999.f ;
                w_classicL_conf:coordinates = "height longitude latitude";

The _FillValue is probably sufficient. If you want the software to also check if values are within the valid range, then leave valid_range in, otherwise take it out for efficiency. I assume you just want to document, so I would suggest just using a different attribute name.


Gary Granger wrote:
Hi Cory, Don, and Bill:

Don, here's some background info: Cory is working on generating netcdf
files from NIMA for many of the kinds of profiler data we have.  So I'm
thinking this is a good time to straighten out our netcdf conventions as
best we can.  Cory has much of the CDL already specified, and below I
suggest some changes and ask some questions.  We've already incorporated
the changes you suggested to me back at T-REX.  So I invite your feedback
so that we can make our data as compatible as possible with Unidata tools
like IDV and with existing conventions like CF.  The CDL files most
relevant to Unidata are 'consensus.cdl' and 'rass.cdl', since they contain
the derived measurements for winds and virtual temperature.  However, it
might also be useful if we can display some of the intermediate data in
IDV, so we'll try to make the conventions among the files as consistent as
possible. I've cc'd Ethan and John since you included them in the T-REX email. The more eyes the better. I've attached that email for reference.

I've attached the CDL files from Cory with many of the changes I suggest
below already made.  I'm hoping we can come to a quick consensus on the
final changes we need to make so that Cory can finish her implementation.
Then I will (eventually) also be fixing all of our other profiler software
to start following the new conventions.

I've been using the CF conventions as my main reference, so I've included links to the relevant parts of that document. I've also compiled all the CDL files into netcdf and passed them through the CF-checker here:

  http://titania.badc.rl.ac.uk/cgi-bin/cf-checker.pl

This gave some useful suggestions which motivate many of my suggestions
here.  The most common warning is not recognizing units of 'dB' and
'meters^(2/3)/second', but there's nothing we can do about that because
those are due to limitations in udunits.  [It also warns about MHz, but
that's bogus because udunits does recognize those units.]

Don, do you have any suggestions for units like dB and meters^(2/3)/second
(for eddy dissipation rate)?

I had thought to change our 'heights(time,height)' variable to
'height(time,height)' so that height would look like a coordinate variable.
However, CF seems to discourage this because multi-dimensional coordinate
variables could break COARDS-compliant and NUG-compliant applications.
Should we keep 'heights' or use 'height'?  I've changed the cdl's to use
'height', but if everyone thinks 'heights' is better we can keep that.
Either way we can add the 'coordinates' attribute to variables and set it
to 'height[s] longitude latitude'.

 http://tinyurl.com/326u67


 It doesnt matter to IDV/CDM, but theres no reason not to follow CF advice to 
not use dimension name for multidimensional coordinates. So heights is better.

A bigger problem is that heights _is_ a vertical coordinate, as is altitude. 
How to indicate this?


What units should the unitless confidence variables have?  Should we
explicitly specify '1' just for clarity?  CF-COARDS does not require it,
since lack of 'units' implies unitless, but I'd suggest using '1' for
completeness.


  for unitless quantities like confidence, better to use an empty string units = "", 
although "1" is ok also. definitely dont omit the attribute units


I changed wdir in consensus.cdl to use units of 'degrees'.

Bill, should the consensus files include radar parameters like ncoh and
nspec and all the other parameters in spc.cdl?  ie, is that kind of
configuration info useful enough to be carried forward into the data files
that we will likely release to PIs?

I changed time long_name to "Time". I'd prefer to store the time values as offsets from the
first time in the file, and change units to "seconds since <first time in
file>".  This makes it easy to ncdump a file and see the ascii rep for the
first time in the file, and the time values are humanly interpretable, but
any software following the udunits convention will be able to parse the
units and compute the times.  I can see the value in not having to parse
the units string in things like IDL scripts, but really my preference would
be to make IDL accommodate the best file conventions possible rather than
the other way around.  As a compromise we can add an "optional" attribute
to the time variable like 'units_string_in_unix_seconds', and store the
unix seconds there.

Another option is to add another variable with the ISO date/time string. Im 
planning on making a change request to CF to allow this as a valid time 
coordinate.


For sample_start_time and sample_end_time, I presume these are the periods
of time over which the consensus is computed.  It turns out that CF has a
convention for this, which is to use a second dimension of length two to
store the beginning and end of the coordinate interval.  For example,
define a single variable sample_times(time, 2), and then store the sample
start time in sample_times[time, 0] and the end time in sample_times[time,
1].  Then there is a 'bounds' attribute to 'time' which names sample_times
as the variable holding the interval boundaries.  So for the sake of
following existing practice, I suggest replacing sample_start_time and
sample_end_time with sample_times.  That means sample_times should not have
a _FillValue attribute, because it should never be empty.  The same should
go for 'time', even though the cf-checker didn't complain about that one.
consensus.cdl has an example of this change; rass.cdl and wind.cdl would
need the same change.

yes, bounds is good. we actually are parsing it, and the info is available in 
the CoordinateAxis object.


 http://tinyurl.com/38htef

Global 'author' attribute: I remember Cory that you'd allow using an
environment variable to set the value.  In case that variable is not set,
then I think the author attribute should be left off instead of having a
default, unless the default is empty.  I'm a little concerned about
unexpected values being set for author because the user's not aware of
what's already set in the environment, but I guess I can live with it.  As
an alternative, or in addition to 'author', the CF convention also mentions
'institution'.  Maybe that one can be really general but still helpful,
like 'NCAR'.

For global attributes you might want to have a look at:

 
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html


I added standard_name to a few wind variables: wspd, wvert, wdir.

I think we should remove the Data_start_time, Data_end_time, and 'date'
global attributes.  They are redundant and not completely intuitive; for
example, what if we store profiler data across a date boundary?  And is the
date in local or GMT?

Speaking of which, maybe it's worthwhile to store the local timezone in a
global attribute, if available.  Or maybe we should just use longitude to
do things like plot times relative to the local solar noon.  Bill?

spectraDbs:_FillValue needs to be outside valid_range, so I changed it
to -99999.

as noted before, i dont really like valid_range, but i know its in wide use.



All the 'height' variables should have a standard_name attribute of 'height'.
However, I have a question about the interpretation height.  The 'height'
standard_name implies height above the surface, ie, ground, which should be
good enough for us.  (Does anyone ever include the height of the antenna
above the ground in the calculation of gate heights?)  And long_name should
be more precise if possible, such as 'Height of center of gate' or 'Height
above ground to bottom of gate' or whatever it is; I'm not actually sure
myself.

Of course, 'height' should only apply to true vertical variables, like at
least the derived winds and virtual temperatures.  For radial measurements
(moments, snr, doppler, ...), do we usually store the "height" as the
distance along the beam, or is it stored as the actual height above the
ground?  If the latter, then I guess we can keep the height convention for
those variables too, otherwise we need something different.  I just wasn't
sure.  Bill or Cory, can you clarify this for me?

Should we store an alternate coordinate variable for the gate altitudes in
meters above MSL?  Or assume that software can be smart enough to add
height to altitude?  For example, we could add a 'gate_alt' variable with
the gate altitudes pre-computed:

variables:
    float wspd(time, height);
          wspd:coordinates = "lat lon gate_alt time";

    float gate_alt(time, height);
          gate_alt:standard_name = "altitude";
          gate_alt:axis = "Z";
          gate_alt:units = "meters";
          gate_alt:long_name = "Altitude to Center of Gate";

I assume this would make it easier for more generic software to integrate
profiler data at the correct relative heights, but maybe it's excessive.

this is the issue that needs some more thought


Should our lat/lon/alt variables have a single dimension of size 1, to make
them COARDS-compliant coordinate variables?  CF allows scalar coordinate
variables, and they can be associated with a variable using the
'coordinates' attribute, but using the 1-dimensional option could be more
universal.

 http://tinyurl.com/2lrjkt

my own opinion is its a mistake to extrapolate COARDS, which is about grids, to observational data. It works ok as long as you are storing a single profile in the file, but is incorrect when storing multiple profiles in a file. For that case, the correct generalization is a "profile" dimension, and then lat(profile), lon(profile), etc.
So I would advise against it.



Should we store the boundaries for the gate heights, eg
height_bounds(time,height,2), so that it's obvious where the gate is and
where the height coordinate falls relative to the gate?  Or is there an
attribute we can specify to indicate that the height coordinates are always
at the center of the gate (assuming that's where they are)?

 http://tinyurl.com/38htef

We will assume that the coordinate is a midpoint, and edges are half-way in 
between. Use bounds if thats not the case, or you need to convince some other 
piece of software of the correct interpretation.


Should we identify the Conventions as 'CF-1.0' (assuming we can make
everything compliant), or is it still better to have a separate convention
'EOL Profiler Convention 1.0'?

 http://tinyurl.com/2mfobc

If you can make it CF, you can do :Conventions = "EOL Profiler Convention 1.0, CF-1.0" indicating 
that it satisfies both. What you have then is "EOL Profiler Convention 1.0" extends 
"CF-1.0" in some sense.


Thank you!
gary


------------------------------------------------------------------------

Subject:
Start of a conversation on profiler data
From:
Don Murray <address@hidden>
Date:
Fri, 27 Jan 2006 15:19:18 -0700
To:
Bill Brown <address@hidden>

To:
Bill Brown <address@hidden>
CC:
Gary Granger <address@hidden>, address@hidden, Ethan Davis <address@hidden>



Hi Bill-



As I mentioned earlier this week, we'd like to work with

EOL to come up with a convention for the profiler data that

you produce so it will be easier to ingest into IDV and other

netCDF programs.



So far, the two types of vertical profiler files that we have

are the EOL MAPR/RASS/DBS formats and the FSL WPDN format.  The

main difference between the structure of the files is that

your formats are one station for multiple times and the FSL

are multiple stations at one time.  If you know of others,

please let us know (and pointers to samples would be good).



I took one of the MAPR files at:



http://www.atd.ucar.edu/rtf/projects/srp2004/iss/realtime/data/iss-mapr



(mapr040320.windsnc_05) and modified it slightly to make

it a little easier to read.  The modified file is attached.



Basically, I changed the time_offset(time) variable to

time(time) and fixed some of the units to be udunits

compatible.  I also added a few Global attributes that

add some information which would be useful down the road:



                :Conventions = "EOL Profiler Convention 1.0"

                :latitude_coordinate = "lat";

                :longitude_coordinate = "lon";

                :zaxis_coordinate = "alt";

                :time_coordinate = "time";



The Conventions tag would be a way of describing the format

and having a version number would help as it evolves (e.g.,

corrections, support for new sensors/parameters).



The *_coordinate variables allow one to name variables whatever

they want, but define them in canonical terms.  This was taken

from the Unidata Observation Dataset Convention:



http://www.unidata.ucar.edu/software/netcdf-java/formats/UnidataObsConvention.html



(under the "Identifying the Coordinate Variables" section).



For T-REX, I understand you not wanting to make changes that

could create problems, but if you could fix the units, I could

create a reader for the IDV pretty easily.  If you could

change time_offset to time, that would be even better.  That

change would be more in line with how the RASS files look,

so maybe that change could be supported for T-REX. I don't

need the global attributes for T-REX, but it's something to

consider for the future.  Another help for the MAPR files would

be to put a .nc extension on them, but that's not critical.



I've cc'd John Caron and Ethan Davis who are the netCDF

experts and Gary Granger who's working with you on the

EOL end.



I'll be out of town next week, but I wanted to get this

conversation started.  If you have any questions about the

changes, let me know.



Don

*************************************************************

Don Murray                               UCAR Unidata Program

address@hidden                        P.O. Box 3000

(303) 497-8628                              Boulder, CO 80307

http://www.unidata.ucar.edu/staff/donm

        "Time makes everyone interesting, even YOU!"

*************************************************************