Greetings! Forgive me for bringing up a difficulty which has been encountered before, but I'm interested in seeing if anybody has made any headway. We have an interest in storing a variety of numerical data as output from a software simulation of a physical system. The data which must be stored are physical quantities which depend on spatial coordinates and time. The output data must be read by another program, possibly running on a different platform, so we would like the data file to be platform independent. There is a large quantity of output data so we practically need a binary file. We are interested in netCDF as a means to achieve an easily read platform independent binary data file. We would also enjoy being able to use the tools which have been developed for examining netCDF files. Unfortunately, our data is not necessarily regularly patterned, and it seems that it may not fit well within a standard netCDF file. Our natural inclination would be to use the time value as the unlimited dimension within the file and then define the coordinates of our spatial grid points using three additional dimensions. Problems: Our data is sparse in both a spatial and time sense; i.e. not every physical quantity is written out at each time step, nor at every spatial grid point within a given timestep. Our grids themselves can vary as a function of the timestep, i.e. a finer grid might be created inside of a cell for a single time step for needed accuracy. The finer grid would only exist for one or two timesteps and would then no longer be used for the rest of the simulation. As I understand the structure of the netCDF file, the only way that we could have a file contain all of the quantities would be to set up the dimensions to enumerate every possible grid location and timestep which is ever used within the simulation and store within these dimensions. Nulls will be written for any values which are not explicitly placed in the file. This would waste a tremendous amount of space in the file due to the sparseness of our data, so much so as to make it unusable. I've seen from the archives that people have used some tricks to get around the time sparseness issue, including sub-record schemes (good if the data is regularly patterned in time) and using separate netCDF files for quantities which are written at different frequencies. These won't work easily for our problem because we can't generally predict in advance when the quantities may need to be written, and because they don't address the issue of spatial data sparseness. Our current code writes a platform dependent binary file which must be run through a conversion prior to being loaded by our second program. The binary file is written efficiently using our own data format, which takes advantage of subheaders at the beginning of each time record indicating exactly what has been stored within the time record. We see three possibilities for writing a platform independent binary data file with a reasonable size: 1. Filter the output from our existing routines through the XDR library in order to write a platform independent binary file. The output side should be easy, just one different step in writing to the file. It would require some amount of coding on the input stage to the second program, as we'd need to have it dissect the proprietary binary (but platform independent) file. Here we keep the efficient file size but lose the benefits of netCDF like external utilities, simple function calls to retrieve values, etc. 2. Do something clever using the existing netCDF routines. This would be something like the sub-record scheme or multiple file workaround, but would need to address all of our sparseness problems. I haven't thought of anything too great yet...? 3. Modify the netCDF library to allow for sub-headers at each record explicitly showing what is stored within that record. I'm not sure how difficult this would be yet. We would still lose the benefits of compatibility with netCDF with respect to utilities, etc. In addition we would need to maintain the code with respect to updates in netCDF if we wanted to take advantage of benefits of the updates. However, it would provide the benefits of nice standard functions to retrieve arbitrary pieces of data. I'm sure that there would be a performance hit on random access reads/writes because you would no longer have a nice fixed record size. I don't know how much of a hit it would be. Is anybody else facing a data storage problem with sparse and general data? Any suggestions? Thanks sincerely, Jeremy Beal jbeal@nvmedia.com