NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Russ, > > >There are some advantages of sequence numbers over times: > > > - you don't have to worry about clock resolution and the possibility > > > that creation times of two objects are equal > > Hmm, we use the gettimeofday() routine, which returns values in > > microseconds, so this probably would not be too much of an issue, but I > > admit > > it certainly is possible. > > We ran into just this problem on a skiplist implementation (for LDM > not netCDF) that required a total ordering. Time stamps worked most > of the time, but if two events happened to get assigned the same > microsecond clock tick, we lost track of one of the corresponding > objects. On old machines, we never saw the problem, but it bit us > when we tried running on faster hardware. We ended up adding what was > essentially a sequence number to the timestamp to disambiguate > matching microsecond clock times. Well, I hope that we can create objects in the file fast enough that having only a microsecond resolution is a problem for HDF5 also... :-) > > Hmm, I think there may be some issues with a creation sequence number also: > > - The "last number issued" will need to be stored in the file (unlike > > creation times). > > - Should it be local to the group, or global to the file? There are > > pro's and con's to both: > > Global: > > - Pro: One number to track for file > > - Con: May have contention for updating this number in a > > parallel environment. > > - Con: Faster to roll over than a sequence number per group. > > - Con: Sequence numbers in one group will have gaps, if > > objects are created in other groups, which does not > > imply objects were deleted in the group. > > > > Local: > > - Pro: More consistent numbering within one group than a > > sequence number per file. > > - Con: May have contention for updating this number in a > > parallel environment. > > - Con: A new piece of metadata to update with every object > > created in a group. > > > > I guess I would tend toward a local (i.e. per group) sequence number. > > How's that sit with people? > > Good analysis of sequence number problems. I agree with you, local > seems to be adequate unless we chose to ignore Group semantics for the > netCDF-4 interface and just treated the Group name as part of a global > name for a netCDF-4 object. In that case, local would be a problem, > because two netCDF-4 objects that we wanted to iterate over in order > could get the same sequence number. Maybe this is an argument not to > treat Groups as just part of the name. Yes, local sequence numbers cut both ways sometimes... Since most (all?) current netCDF users should be used to a 'flat' file, putting all the objects in the root group of the file and using the creation order in that group seems like a reasonable default. Then, you could change the definition of the way the creation order information is used for netCDF 4 users so that the group structure was accounted for. BTW, I was looking through the netCDF 3 API for functions that take or return an 'index' in the file and I can't find one. Which function(s) applies to this situation? > For us, a different kind of local would also work: a set of sequence > numbers for Datasets, for each Dataset's Attributes, and for shared > dimension Scales. But if you have other uses for time stamps or > sequence numbers, our use shouldn't dictate the requirements, since > anything that allows us to determine the creation order of netCDF > variables, dimensions, and attributes would work. This is along lines that we've thought about for a long time: adding a live" index capability to HDF5 files, where every change to the file's metadata (object creation, modification, deletion and attribute creation & deletion) could update an index in the file in some way. I think this is a great idea, but I think it would be too much work at the current time. :-( Quincey
netcdf-hdf
archives: