NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Ed, > > > Unfortunately there aren't generic instructions for this sort of > > > thing, > > >it's very application-I/O-pattern dependent. A general heuristic is to > > >pick > > >lower and upper bounds on the size of a chunk (in bytes) and try to make > > >the > > >chunks "squarish" (in n-D). One thing to keep in mind is that the default > > >chunk cache in HDF5 is 1MB, so it's probably worthwhile to keep chunks > > >under > > >half of that. A reasonable lower limit is a small multiple of the block > > >size > > >of a disk (usually 4KB). > > 1 MB seems low for scientific applications. Even cheap consumer PCs come > with about half a gig of RAM. Scientific machines much more > so. Wouldn't it be helpful to have 100 MB, for example? Yes, we've kicked that around, we should bump it up to something more reasonable in a future release. > > > Generally, you are trying to avoid the situation below: > > > > > > Dataset with 10 chunks (dimension sizes don't really matter): > > > +----+----+----+----+----+ > > > | | | | | | > > > | | | | | | > > > | A | B | C | D | E | > > > +----+----+----+----+----+ > > > | | | | | | > > > | | | | | | > > > | F | G | H | I | J | > > > +----+----+----+----+----+ > > > > > > If you are writing hyperslabs to part of each chunk like this: > > > (hyperslab 1 is in chunk A, hyperslab 2 is in chunk B, etc.) > > > +----+----+----+----+----+ > > > |1111|2222|3333|4444|5555| > > > |6666|7777|8888|9999|0000| > > > | A | B | C | D | E | > > > +----+----+----+----+----+ > > > | | | | | | > > > | | | | | | > > > | F | G | H | I | J | > > > +----+----+----+----+----+ > > > > > > If the chunk cache is only large enough to hold 4 chunks, then > > > chunk > > > A will be preempted from the cache for chunk E (when hyperslab 5 is > > > written), but will immediately be re-loaded to write hyperslab > > > 6 out. > > OK, great. Let me see if I can start to come up with the rules by > which I can select chunk sizes: > > 1 - Min chunk size should be 4 KB. > 2 - Max chunk size should allow n chunks to fit in the chunk cache, > where n is around the max number of chunks the user will access at > once in a hyper-slab. Generally, yes. > > > > > > Unfortunately, our general purpose software can't predict the I/O > > > pattern > > > that users will access the data in, so it is a tough problem. One > > > the one hand, > > >you want to keep the chunks small enough that they will stick around in the > > >cache until they are finished being written/read, but you want the chunks > > >to > > >be larger so that the I/O on them is more efficient. :-/ > > I think we can make some reasonable guesses for netcdf-3.x access > patterns, so that we can at least ensure the common tasks are working > fast enough. Cool. > Obviously any user can flummox our optimizations by doing some odd > things we don't expect. As my old engineering professors told me: you > can make it foolproof, but you can't make it damn-foolproof. :-) Quincey
netcdf-hdf
archives: