Re: [netcdf-java] Data precision while aggregating data

This is interesting.  I think a move to ISO strings would be a good
one - do you think it's worth bringing this up again with CF?  I'd
support this, FWIW.

Am I correct in thinking that the problem is caused because a human
means "calendar days" or "calendar months" but udunits means a
specific, fixed number of milliseconds?

Can this be fixed in NetCDF-Java without change to CF, perhaps by
using Joda-time (a proper calendaring library) instead of udunits for
time handling?

Cheers, Jon

On Thu, May 15, 2008 at 2:08 AM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
> Im not quite sure where the inaccuracy comes in, likely converting between
> Date and udunits representation. Ill have to see what I can do.
>
> A few comments:
>
> 1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
> accuracy. thus:
>
>  public void testDoublePrecision() {
>    double dval = 47865.7916666665110000;
>    System.out.println(" dval= "+dval);
>  }
>
> prints:
>
>  dval= 47865.79166666651
>
> 2) preserving lowest bits of accuracy is tricky, and requires care, which i
> promise has not (yet) happened in the CDM units handling. in general,
> relying lowest bits being preserving is dicey.
>
> 3) what is the definition of a "day". how accurate do you need that? All I
> could find was this note in the units package:
>
>         * Interval between 2 successive passages of sun through vernal
> equinox
>         * (365.242198781 days -- see
>         * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
>         * http://aa.usno.navy.mil/AA/
>         * and http://adswww.colorado.edu/adswww/astro_coord.html):
>
> you may agree, but what if someone uses a different meaning for "day" ??
>
> 4) IMHO, using udunits for calender date is a mistake. its a units package,
> not a calender package.
>
> 5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
> unreadable to humans.
>
> 6) I earlier proposed to CF that we allow ISO date strings, more readable,
> not ambiguous, and doesnt have a precision problem. Various CF authorities
> thought it wasnt needed because it was redundant with the udunits
> representation.
>
>
>
> Rich Signell wrote:
>>
>> Jon,
>>
>> The precision of the time vector with "units since XXXX" must
>> definitely be considered carefully, but we did think about this.
>>
>> We want to store all our oceanographic time series data with the same
>> time convention to facilitate aggregation and minimize mods to
>> existing software.
>>
>> Choosing time as double precision with units of "days since 1858-11-17
>> 00:00"  should give us a precision of:
>>  - Better than 3.0e-5 milliseconds until August 31, 2132 and
>>  - Better than 3.0e-4 milliseconds until October 12, 4596!
>>
>> (This is actually is the definition of "Modified Julian Day", which is
>> one of the few internationally recognized time conventions that starts
>> at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
>> It also has the advantage of being a date by which nearly all the
>> world had finally switched to a Gregorian calendar, and early enough
>> so that most of the data we want to represent will have positive time
>> values.)
>>
>> The bug Sachin reported is a big deal for us, since we want to use
>> NcML and THREDDS as a way of serving our hundreds of oceanographic
>> time series files as CF compliant using NcML with the THREDDS data
>> server without changing any of the original files.    The original
>> files are NetCDF, but with a non-standard convention for time:  an
>> integer array with julian day, and a second integer array with
>> milliseconds since midnight.    This allows integer math with time to
>> give results with no round off problems.
>>
>> We have a script in Matlab (that uses double precision math) to take
>> our two integer format for time and create NcML for a CF-compliant
>> time array using start and increment.   That script produces NcML like
>> this:
>>
>> <variable name="time" shape="time" type="double">
>>  <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
>>  <attribute name="long_name" value="Modified Julian Day"/>
>>  <values start="47865.7916666665110000" increment="0.0416666666666667"/>
>> </variable>
>>
>> As Sachin mentioned, the start time for this file is  "05-Dec-1989
>> 19:00:00", and as proof that we have sufficient precision, when we
>> simply load the time vector in NetCDF-java and do the double precision
>> math in Matlab, we get the right start time:
>>
>> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)
>>
>> ans =  05-Dec-1989 19:00:00
>>
>> but when we use the NetCDF-Java time routines to convert to Gregorian, we
>> get
>>
>> 05-Dec-1989 18:59:59 GMT
>>
>> Clearly our users will not accept this.   I hope this can get resolved
>> soon!!!!
>>
>> -Rich
>>
>> On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb@xxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi,
>>>
>>>  I have seen similar issues (time values being out by a second or two).
>>>  I was wondering whether it's something to do with udunits and
>>>  calculating dates on the basis of "units since XXXXXX".  I seem to
>>>  remember an earlier conversation on this list (or maybe on the CF
>>>  list) concerning how udunits defines the length of certain time-spans
>>>  (e.g. a month) and wondered whether this might be the issue?  Jonathan
>>>  Gregory recommended against using "months since" and "years since" and
>>>  sticking to seconds or days to avoid ambiguities in the length of a
>>>  month/year.  But maybe this is a red herring.
>>>
>>>  Whatever the issue is I'd be very keen to understand it as it's
>>>  affecting me too!
>>>
>>>  Cheers, Jon
>>>
>>>
>>>  On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
>>>  <skbhate@xxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>>> John,
>>>
>>>  >
>>>  >  The NcML  file shown below attempts to aggregate time series files,
>>>  >  overriding
>>>  >  the time values for each 'time' variable.
>>>  >
>>>  >  The aggregation works great and I can access the time values as well,
>>>  >  but I see that there is loss of precision in the new time values,
>>> when I
>>>  >  access
>>>  >  values for a coordinate data variable.
>>>  >
>>>  >  For example:
>>>  >
>>>  >  <<<<
>>>  >    URI =
>>>  >
>>>  'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
>>>  >    String var="T_20";
>>>  >
>>>  >    GridDataset gid = GridDataset.open(URI);
>>>  >    GeoGrid Grid = gid.findGridByName(var);
>>>  >    GridCoordSys GridCoordS = (GridCoordSys)
>>> Grid.getCoordinateSystem();
>>>  >
>>>  >     java.util.Date d[] = GridCoordS.getTimeDates();
>>>  >
>>>  >     System.out.println("DateString: "+d[0].toGMTString());
>>>  >   >>>>>
>>>  >
>>>  >  The output from the above code for the 1st time value in the java
>>> Date
>>>  >  array.
>>>  >
>>>  >  DateString: 5 Dec 1989 18:59:59 GMT
>>>  >
>>>  >  But, the correct value should be
>>>  >
>>>  >  DateString: 5 Dec 1989 19:00:00 GMT
>>>  >
>>>  >
>>>  >  Just out of curiosity I tried to print the 1st time value being read
>>>  >  from the NcML,
>>>  >  by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
>>>  >
>>>  >  Start = 47865.79166666651;   (Parsed as double)
>>>  >
>>>  >  but,  the 1st start value specified in NcML is
>>>  '47865.7916666665110000'.
>>>  >
>>>  >  Don't care about the tailing '0s', but the digit '1' in the 12th
>>> decimal
>>>  >  place is being dropped and may be causing this
>>>  >  problem.
>>>  >
>>>  >  Although, parsing it as a 'BigDecimal' does read in the correct
>>> value.
>>>  >
>>>  >  Start-BigDecimal: 47865.7916666665110000
>>>  >
>>>  >
>>>  >  I am just guessing here, I am not sure if this is what causing the
>>>  >  precision problem.
>>>  >
>>>  >  Will appreciate your help.
>>>  >
>>>  >  thanks..
>>>  >
>>>  >  Sachin
>>>  >
>>>  >  --
>>>  >  Sachin Kumar Bhate, Research Associate
>>>  >  MSU-High Performance Computing Collaboratory, NGI
>>>  >  John C. Stennis Space Center, MS 39529
>>>  >  http://www.northerngulfinstitute.org/
>>>  >
>>>  >
>>>  >
>>>  >  _______________________________________________
>>>  >  netcdf-java mailing list
>>>  >  netcdf-java@xxxxxxxxxxxxxxxx
>>>  >  For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>  >
>>>
>>>
>>>
>>>  --
>>>  --------------------------------------------------------------
>>>  Dr Jon Blower Tel: +44 118 378 5213 (direct line)
>>>  Technical Director Tel: +44 118 378 8741 (ESSC)
>>>  Reading e-Science Centre Fax: +44 118 378 6413
>>>  ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
>>>  University of Reading
>>>  3 Earley Gate
>>>  Reading RG6 6AL, UK
>>>  --------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>>  netcdf-java mailing list
>>>  netcdf-java@xxxxxxxxxxxxxxxx
>>>  For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>>
>>
>



-- 
--------------------------------------------------------------
Dr Jon Blower Tel: +44 118 378 5213 (direct line)
Technical Director Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre Fax: +44 118 378 6413
ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------


  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: