Making Data Access Faster

15.8 Making Data Access Faster

If netCDF access time is a bottleneck, these techniques may help.

For classic formats:

Change dimension order for variables so you read data in the order it was written, with most frequently varying dimension last. Avoid writing or reading only one or two values per disk block.
Each seek to a new location in a file causes buffer flushes, so reading randomly negates advantages of buffering.

For netCDF-4 formats:

Specify variable chunks to closely match most common data access patterns.
Don't use tiny chunks, B-tree overhead on lots of chunks will dominate access time.
Try default chunks for efficiently accessing data in unanticipated ways.
Be aware that minor differences in chunk lengths can make major differences in access time and disk space. See page 18 of this report for an explanation.

For all netCDF formats:

If data must be written in a different order than it is read, consider rewriting the data to the most efficient read order.
If writing is a bottleneck and you write all data values for each variable, consider turning off default "fill mode", which pre-fills data with fill values. Advantage: saves 50% of write time. Disadvantage: eliminates possibility of detecting unwritten values.
When writing from multiple processors, use parallel I/O.
SSD instead of disk may change latency, seek time, optimal strategies.