Re: something startling I just noticed...

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

Quincey,

> >There are some advantages of sequence numbers over times:
> >  - you don't have to worry about clock resolution and the possibility
> >    that creation times of two objects are equal
>     Hmm, we use the gettimeofday() routine, which returns values in
> microseconds, so this probably would not be too much of an issue, but I admit
> it certainly is possible.

We ran into just this problem on a skiplist implementation (for LDM
not netCDF) that required a total ordering.  Time stamps worked most
of the time, but if two events happened to get assigned the same
microsecond clock tick, we lost track of one of the corresponding
objects.  On old machines, we never saw the problem, but it bit us
when we tried running on faster hardware.  We ended up adding what was
essentially a sequence number to the timestamp to disambiguate
matching microsecond clock times.

> >  - adding 1 is cheaper than the system call necessary to access the
> >    system clock
>     True, but both are minor compared to the cost of disk I/O involved, I 
> think.

You're right, this is not worth worrying about.  Premature
optimization on my part ...

> Hmm, I think there may be some issues with a creation sequence number also:
>     - The "last number issued" will need to be stored in the file (unlike
>         creation times).
>     - Should it be local to the group, or global to the file? There are
>         pro's and con's to both:
>             Global:
>                 - Pro: One number to track for file
>                 - Con: May have contention for updating this number in a
>                     parallel environment.
>                 - Con: Faster to roll over than a sequence number per group.
>                 - Con: Sequence numbers in one group will have gaps, if
>                     objects are created in other groups, which does not
>                     imply objects were deleted in the group.
> 
>             Local:
>                 - Pro: More consistent numbering within one group than a
>                     sequence number per file.
>                 - Con: May have contention for updating this number in a
>                     parallel environment.
>                 - Con: A new piece of metadata to update with every object
>                     created in a group.
> 
> I guess I would tend toward a local (i.e. per group) sequence number.
> How's that sit with people?

Good analysis of sequence number problems.  I agree with you, local
seems to be adequate unless we chose to ignore Group semantics for the
netCDF-4 interface and just treated the Group name as part of a global
name for a netCDF-4 object.  In that case, local would be a problem,
because two netCDF-4 objects that we wanted to iterate over in order
could get the same sequence number.  Maybe this is an argument not to
treat Groups as just part of the name.

For us, a different kind of local would also work: a set of sequence
numbers for Datasets, for each Dataset's Attributes, and for shared
dimension Scales.  But if you have other uses for time stamps or
sequence numbers, our use shouldn't dictate the requirements, since
anything that allows us to determine the creation order of netCDF
variables, dimensions, and attributes would work.

--Russ

  • 2003 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: