NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Lloyd, > I assume by changing shape you mean NOT considering the unlimited dimension. That's right. I think Glenn was using `shape' of a netCDF file loosely to mean what I call the `schema': the dimensions, variables, and attributes of the netCDF. The unlimited dimension is the only dimension along which data can be appended to an existing netCDF file. Even creating a new variable that uses existing dimensions requires copying. I believe this is one of the differences between netCDF and CDF. Doesn't CDF allow adding new variables without copying, at least in the multi-file implementation? In designing netCDF, we considered the trade-offs and concluded that adding new variables to an existing data file was not a common enough operation among our users and applications to justify a multi-file implementation, especially since users could also use several files of their own design to represent datasets and then also have the benefits of multiple unlimited dimensions. NetCDF was not designed to be a database system supporting frequently changing schema, nested transactions, or other such database features. > ... On the other hand, deleting an > instance (i.e., a record in the conceptual equivalent in the CDF parlance) of > a variable would also change the shape. Is this supported in netCDF without > copying? No, there is no `delete variable' (or `delete dimension') operation in the netCDF interface, though we do support `delete attribute'. The decision not to provide support in the interface for deleting dimensions or variables was again a conscious decision that considered the tradeoffs and uses we had in mind for the interface. There is also no compression or garbage collection after an attribute is deleted, except by copying. You are right to point out that these operations can be expensive for large datasets represented as single netCDF files, but our philosophy has been to support the most common operations efficiently and warn users about what is costly. Some datasets are better represented as several medium sized files rather than a single large file, and this also gives users some flexibility in changing the data schema. I'm not convinced we want to add most of the functionality of a database system including the ability to change the schema efficiently for large datasets. The complexity this adds to both the interface and implementation seems like too high a price to pay, especially when users who need to change the schema of a netCDF file can do so by copying the data. Users must put more thought into the original schema design if they don't have the luxury of cheap changes to the schema, but that may be an advantage. --Russ
netcdf-hdf
archives: