2011 Unidata NetCDF Workshop > Formats and Performance
15.8 Making Data Access Faster
If netCDF access time is a bottleneck, these
techniques may help.
For classic formats:
- Change dimension order for variables so you
read data in the order it was written, with most frequently varying dimension
last. Avoid writing or reading only one or two values per disk block.
-
Each seek to a new location in a file causes buffer
flushes, so reading randomly negates advantages of buffering.
For netCDF-4 formats:
- Specify variable chunks to closely match most common data access
patterns.
- Don't use tiny chunks, B-tree overhead on lots of chunks will
dominate access time.
- Try default chunks for efficiently accessing data in unanticipated ways.
- Be aware that minor differences in chunk lengths can
make major differences in access time and disk space. See
page 18 of
this report for an explanation.
For all netCDF formats:
- If data must be written in a different order than it is read,
consider rewriting the data to the most efficient read order.
- If writing is a bottleneck and you write all data values for each variable,
consider turning off default "fill mode", which
pre-fills data with
fill values. Advantage: saves 50% of write time. Disadvantage: eliminates
possibility of detecting unwritten values.
- When writing from multiple processors, use parallel I/O.
- SSD instead of disk may change latency, seek time, optimal strategies.
2011 Unidata NetCDF Workshop > Formats and Performance