Re: [netcdf-java] Data precision while aggregating data

John,

Four replies to your four comments:   ;-)

On Wed, May 14, 2008 at 9:08 PM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
> Im not quite sure where the inaccuracy comes in, likely converting between
> Date and udunits representation. Ill have to see what I can do.
>
> A few comments:
>
> 1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
> accuracy. thus:
>
>  public void testDoublePrecision() {
>    double dval = 47865.7916666665110000;
>    System.out.println(" dval= "+dval);
>  }
>
> prints:
>
>  dval= 47865.79166666651
>

Okay, you lost the lowest bit, but you should still be fine.   You
still have 11 places after the decimal point.    In Matlab, which uses
double precision arithmetic, I don't get a problem converting to
gregorian until we drop to 8 places after the decimal point:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651)   =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665)    =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666)      =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666)        =>
05-Dec-1989 18:59:59

> 2) preserving lowest bits of accuracy is tricky, and requires care, which i
> promise has not (yet) happened in the CDM units handling. in general,
> relying lowest bits being preserving is dicey.

That's okay -- we don't need to preserve that lowest bit.
>
> 3) what is the definition of a "day". how accurate do you need that? All I
> could find was this note in the units package:
>
>         * Interval between 2 successive passages of sun through vernal
> equinox
>         * (365.242198781 days -- see
>         * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
>         * http://aa.usno.navy.mil/AA/
>         * and http://adswww.colorado.edu/adswww/astro_coord.html):
>
> you may agree, but what if someone uses a different meaning for "day" ??

Take a look at udunits.dat:
http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt

A "day" is precisely defined as 86400 seconds.
A "sidereal day" is a different unit.

>
> 4) IMHO, using udunits for calender date is a mistake. its a units package,
> not a calender package.

Maybe, but I think to solve the current problem, we could just find
out where the computations are dropping the double precision.

>
> 5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
> unreadable to humans.

What is not unreadable about that?   Yes, it's a big number with a lot
of precision, and a older date, but I think it's perfectly readable
and unambigous.    And as I mentioned, it's a an international
recognized convention called "Modified Julian Date".

>
> 6) I earlier proposed to CF that we allow ISO date strings, more readable,
> not ambiguous, and doesnt have a precision problem. Various CF authorities
> thought it wasnt needed because it was redundant with the udunits
> representation.

I think allowing ISO date strings in CF would be a good idea, and I
also think allowing a two integer representation in CF would be a good
idea (we use Julian day, and milliseconds since midnight as our two
integer vectors).   But that idea was also not too popular.   Several
people thought it would be a good idea, including Balaji, but there
was concern about to need to modify all existing CF applications to
handle these new time conventions.     But if this was just handled in
UDUNITS, I don't think this would be much problem, as I would think
that most CF-compliant apps have used the UDUNITS library to to their
math.

-Rich

>
>
> Rich Signell wrote:
>>
>> Jon,
>>
>> The precision of the time vector with "units since XXXX" must
>> definitely be considered carefully, but we did think about this.
>>
>> We want to store all our oceanographic time series data with the same
>> time convention to facilitate aggregation and minimize mods to
>> existing software.
>>
>> Choosing time as double precision with units of "days since 1858-11-17
>> 00:00"  should give us a precision of:
>>  - Better than 3.0e-5 milliseconds until August 31, 2132 and
>>  - Better than 3.0e-4 milliseconds until October 12, 4596!
>>
>> (This is actually is the definition of "Modified Julian Day", which is
>> one of the few internationally recognized time conventions that starts
>> at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
>> It also has the advantage of being a date by which nearly all the
>> world had finally switched to a Gregorian calendar, and early enough
>> so that most of the data we want to represent will have positive time
>> values.)
>>
>> The bug Sachin reported is a big deal for us, since we want to use
>> NcML and THREDDS as a way of serving our hundreds of oceanographic
>> time series files as CF compliant using NcML with the THREDDS data
>> server without changing any of the original files.    The original
>> files are NetCDF, but with a non-standard convention for time:  an
>> integer array with julian day, and a second integer array with
>> milliseconds since midnight.    This allows integer math with time to
>> give results with no round off problems.
>>
>> We have a script in Matlab (that uses double precision math) to take
>> our two integer format for time and create NcML for a CF-compliant
>> time array using start and increment.   That script produces NcML like
>> this:
>>
>> <variable name="time" shape="time" type="double">
>>  <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
>>  <attribute name="long_name" value="Modified Julian Day"/>
>>  <values start="47865.7916666665110000" increment="0.0416666666666667"/>
>> </variable>
>>
>> As Sachin mentioned, the start time for this file is  "05-Dec-1989
>> 19:00:00", and as proof that we have sufficient precision, when we
>> simply load the time vector in NetCDF-java and do the double precision
>> math in Matlab, we get the right start time:
>>
>> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)
>>
>> ans =  05-Dec-1989 19:00:00
>>
>> but when we use the NetCDF-Java time routines to convert to Gregorian, we
>> get
>>
>> 05-Dec-1989 18:59:59 GMT
>>
>> Clearly our users will not accept this.   I hope this can get resolved
>> soon!!!!
>>
>> -Rich
>>
>> On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb@xxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi,
>>>
>>>  I have seen similar issues (time values being out by a second or two).
>>>  I was wondering whether it's something to do with udunits and
>>>  calculating dates on the basis of "units since XXXXXX".  I seem to
>>>  remember an earlier conversation on this list (or maybe on the CF
>>>  list) concerning how udunits defines the length of certain time-spans
>>>  (e.g. a month) and wondered whether this might be the issue?  Jonathan
>>>  Gregory recommended against using "months since" and "years since" and
>>>  sticking to seconds or days to avoid ambiguities in the length of a
>>>  month/year.  But maybe this is a red herring.
>>>
>>>  Whatever the issue is I'd be very keen to understand it as it's
>>>  affecting me too!
>>>
>>>  Cheers, Jon
>>>
>>>
>>>  On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
>>>  <skbhate@xxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>>> John,
>>>
>>>  >
>>>  >  The NcML  file shown below attempts to aggregate time series files,
>>>  >  overriding
>>>  >  the time values for each 'time' variable.
>>>  >
>>>  >  The aggregation works great and I can access the time values as well,
>>>  >  but I see that there is loss of precision in the new time values,
>>> when I
>>>  >  access
>>>  >  values for a coordinate data variable.
>>>  >
>>>  >  For example:
>>>  >
>>>  >  <<<<
>>>  >    URI =
>>>  >
>>>  'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
>>>  >    String var="T_20";
>>>  >
>>>  >    GridDataset gid = GridDataset.open(URI);
>>>  >    GeoGrid Grid = gid.findGridByName(var);
>>>  >    GridCoordSys GridCoordS = (GridCoordSys)
>>> Grid.getCoordinateSystem();
>>>  >
>>>  >     java.util.Date d[] = GridCoordS.getTimeDates();
>>>  >
>>>  >     System.out.println("DateString: "+d[0].toGMTString());
>>>  >   >>>>>
>>>  >
>>>  >  The output from the above code for the 1st time value in the java
>>> Date
>>>  >  array.
>>>  >
>>>  >  DateString: 5 Dec 1989 18:59:59 GMT
>>>  >
>>>  >  But, the correct value should be
>>>  >
>>>  >  DateString: 5 Dec 1989 19:00:00 GMT
>>>  >
>>>  >
>>>  >  Just out of curiosity I tried to print the 1st time value being read
>>>  >  from the NcML,
>>>  >  by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
>>>  >
>>>  >  Start = 47865.79166666651;   (Parsed as double)
>>>  >
>>>  >  but,  the 1st start value specified in NcML is
>>>  '47865.7916666665110000'.
>>>  >
>>>  >  Don't care about the tailing '0s', but the digit '1' in the 12th
>>> decimal
>>>  >  place is being dropped and may be causing this
>>>  >  problem.
>>>  >
>>>  >  Although, parsing it as a 'BigDecimal' does read in the correct
>>> value.
>>>  >
>>>  >  Start-BigDecimal: 47865.7916666665110000
>>>  >
>>>  >
>>>  >  I am just guessing here, I am not sure if this is what causing the
>>>  >  precision problem.
>>>  >
>>>  >  Will appreciate your help.
>>>  >
>>>  >  thanks..
>>>  >
>>>  >  Sachin
>>>  >
>>>  >  --
>>>  >  Sachin Kumar Bhate, Research Associate
>>>  >  MSU-High Performance Computing Collaboratory, NGI
>>>  >  John C. Stennis Space Center, MS 39529
>>>  >  http://www.northerngulfinstitute.org/
>>>  >
>>>  >
>>>  >
>>>  >  _______________________________________________
>>>  >  netcdf-java mailing list
>>>  >  netcdf-java@xxxxxxxxxxxxxxxx
>>>  >  For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>  >
>>>
>>>
>>>
>>>  --
>>>  --------------------------------------------------------------
>>>  Dr Jon Blower Tel: +44 118 378 5213 (direct line)
>>>  Technical Director Tel: +44 118 378 8741 (ESSC)
>>>  Reading e-Science Centre Fax: +44 118 378 6413
>>>  ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
>>>  University of Reading
>>>  3 Earley Gate
>>>  Reading RG6 6AL, UK
>>>  --------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>>  netcdf-java mailing list
>>>  netcdf-java@xxxxxxxxxxxxxxxx
>>>  For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>>
>>
>



-- 
Dr. Richard P. Signell (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598


  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: