Re: [netcdf-java] Data precision while aggregating data

Hi Jon:

I do support ISO strings in the CDM, but they are not CF compliant. After we investigate this, if there is a compelling argument for them, we could bring it up again.

We will still need to support udunit time coordinates, so the issues of precision will remain. Its possible we can replace with arbitrary precision arithmetic.

Its also likely that we can minimize precision loss in the current implementation. But im not yet sure where its coming from. Bob Simon's guess seems likely.

I havent heard of Joda, but I will check it out.




Jon Blower wrote:
This is interesting.  I think a move to ISO strings would be a good
one - do you think it's worth bringing this up again with CF?  I'd
support this, FWIW.

Am I correct in thinking that the problem is caused because a human
means "calendar days" or "calendar months" but udunits means a
specific, fixed number of milliseconds?

Can this be fixed in NetCDF-Java without change to CF, perhaps by
using Joda-time (a proper calendaring library) instead of udunits for
time handling?

Cheers, Jon

On Thu, May 15, 2008 at 2:08 AM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
Im not quite sure where the inaccuracy comes in, likely converting between
Date and udunits representation. Ill have to see what I can do.

A few comments:

1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
accuracy. thus:

 public void testDoublePrecision() {
   double dval = 47865.7916666665110000;
   System.out.println(" dval= "+dval);
 }

prints:

 dval= 47865.79166666651

2) preserving lowest bits of accuracy is tricky, and requires care, which i
promise has not (yet) happened in the CDM units handling. in general,
relying lowest bits being preserving is dicey.

3) what is the definition of a "day". how accurate do you need that? All I
could find was this note in the units package:

        * Interval between 2 successive passages of sun through vernal
equinox
        * (365.242198781 days -- see
        * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
        * http://aa.usno.navy.mil/AA/
        * and http://adswww.colorado.edu/adswww/astro_coord.html):

you may agree, but what if someone uses a different meaning for "day" ??

4) IMHO, using udunits for calender date is a mistake. its a units package,
not a calender package.

5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
unreadable to humans.

6) I earlier proposed to CF that we allow ISO date strings, more readable,
not ambiguous, and doesnt have a precision problem. Various CF authorities
thought it wasnt needed because it was redundant with the udunits
representation.



Rich Signell wrote:
Jon,

The precision of the time vector with "units since XXXX" must
definitely be considered carefully, but we did think about this.

We want to store all our oceanographic time series data with the same
time convention to facilitate aggregation and minimize mods to
existing software.

Choosing time as double precision with units of "days since 1858-11-17
00:00"  should give us a precision of:
 - Better than 3.0e-5 milliseconds until August 31, 2132 and
 - Better than 3.0e-4 milliseconds until October 12, 4596!

(This is actually is the definition of "Modified Julian Day", which is
one of the few internationally recognized time conventions that starts
at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
It also has the advantage of being a date by which nearly all the
world had finally switched to a Gregorian calendar, and early enough
so that most of the data we want to represent will have positive time
values.)

The bug Sachin reported is a big deal for us, since we want to use
NcML and THREDDS as a way of serving our hundreds of oceanographic
time series files as CF compliant using NcML with the THREDDS data
server without changing any of the original files.    The original
files are NetCDF, but with a non-standard convention for time:  an
integer array with julian day, and a second integer array with
milliseconds since midnight.    This allows integer math with time to
give results with no round off problems.

We have a script in Matlab (that uses double precision math) to take
our two integer format for time and create NcML for a CF-compliant
time array using start and increment.   That script produces NcML like
this:

<variable name="time" shape="time" type="double">
 <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
 <attribute name="long_name" value="Modified Julian Day"/>
 <values start="47865.7916666665110000" increment="0.0416666666666667"/>
</variable>

As Sachin mentioned, the start time for this file is  "05-Dec-1989
19:00:00", and as proof that we have sufficient precision, when we
simply load the time vector in NetCDF-java and do the double precision
math in Matlab, we get the right start time:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)

ans =  05-Dec-1989 19:00:00

but when we use the NetCDF-Java time routines to convert to Gregorian, we
get

05-Dec-1989 18:59:59 GMT

Clearly our users will not accept this.   I hope this can get resolved
soon!!!!

-Rich

On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb@xxxxxxxxxxxxxxxxxxxx>
wrote:
Hi,

 I have seen similar issues (time values being out by a second or two).
 I was wondering whether it's something to do with udunits and
 calculating dates on the basis of "units since XXXXXX".  I seem to
 remember an earlier conversation on this list (or maybe on the CF
 list) concerning how udunits defines the length of certain time-spans
 (e.g. a month) and wondered whether this might be the issue?  Jonathan
 Gregory recommended against using "months since" and "years since" and
 sticking to seconds or days to avoid ambiguities in the length of a
 month/year.  But maybe this is a red herring.

 Whatever the issue is I'd be very keen to understand it as it's
 affecting me too!

 Cheers, Jon


 On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
 <skbhate@xxxxxxxxxxxxxxx> wrote:


John,
 >
 >  The NcML  file shown below attempts to aggregate time series files,
 >  overriding
 >  the time values for each 'time' variable.
 >
 >  The aggregation works great and I can access the time values as well,
 >  but I see that there is loss of precision in the new time values,
when I
 >  access
 >  values for a coordinate data variable.
 >
 >  For example:
 >
 >  <<<<
 >    URI =
 >
 'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
 >    String var="T_20";
 >
 >    GridDataset gid = GridDataset.open(URI);
 >    GeoGrid Grid = gid.findGridByName(var);
 >    GridCoordSys GridCoordS = (GridCoordSys)
Grid.getCoordinateSystem();
 >
 >     java.util.Date d[] = GridCoordS.getTimeDates();
 >
 >     System.out.println("DateString: "+d[0].toGMTString());
 >   >>>>>
 >
 >  The output from the above code for the 1st time value in the java
Date
 >  array.
 >
 >  DateString: 5 Dec 1989 18:59:59 GMT
 >
 >  But, the correct value should be
 >
 >  DateString: 5 Dec 1989 19:00:00 GMT
 >
 >
 >  Just out of curiosity I tried to print the 1st time value being read
 >  from the NcML,
 >  by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
 >
 >  Start = 47865.79166666651;   (Parsed as double)
 >
 >  but,  the 1st start value specified in NcML is
 '47865.7916666665110000'.
 >
 >  Don't care about the tailing '0s', but the digit '1' in the 12th
decimal
 >  place is being dropped and may be causing this
 >  problem.
 >
 >  Although, parsing it as a 'BigDecimal' does read in the correct
value.
 >
 >  Start-BigDecimal: 47865.7916666665110000
 >
 >
 >  I am just guessing here, I am not sure if this is what causing the
 >  precision problem.
 >
 >  Will appreciate your help.
 >
 >  thanks..
 >
 >  Sachin
 >
 >  --
 >  Sachin Kumar Bhate, Research Associate
 >  MSU-High Performance Computing Collaboratory, NGI
 >  John C. Stennis Space Center, MS 39529
 >  http://www.northerngulfinstitute.org/
 >
 >
 >
 >  _______________________________________________
 >  netcdf-java mailing list
 >  netcdf-java@xxxxxxxxxxxxxxxx
 >  For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
 >



 --
 --------------------------------------------------------------
 Dr Jon Blower Tel: +44 118 378 5213 (direct line)
 Technical Director Tel: +44 118 378 8741 (ESSC)
 Reading e-Science Centre Fax: +44 118 378 6413
 ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
 University of Reading
 3 Earley Gate
 Reading RG6 6AL, UK
 --------------------------------------------------------------


_______________________________________________
 netcdf-java mailing list
 netcdf-java@xxxxxxxxxxxxxxxx
 For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/








  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: