Re: [netcdf-java] Reading contiguous data in NetCDF files

Hi Jon,

Benchmarks like these can be quite tricky, due to the interaction of the application with the OS. Unless you purge the OS page cache each time you run your benchmark, your application (after the first test) isn't reading data from disk but is instead copying data from the disk page cache into local buffers, and the benchmark will likely be CPU bound and execution time will be dominated by type conversion from raw buffered data arrays into Java types. That would account for the strange results you are seeing when reading 4K rather than 8K data chunks.

Also, for more info on netcdf-4 chunking/compression, Unidata has a nice introduction at http://hdfeos.org/workshops/ws13/presentations/day1/HDF5-EOSXIII-Advanced-Chunking.ppt

Cheers, Joe

Jon Blower wrote:
Hi John,

Thanks for this.

netcdf-3 IOSP uses a bufferred RandomAccessFile implementation,
default
8096 byte buffer, which always reads 8096 bytes at a time. the only useful optimisation is to change the buffer size.

Good to know, thanks.  I would have thought that this would mean that
there's no point reading data of less than 8096 bytes.  But in my tests
I see that even below this value there's a linear relationship between
the size of data being read and the time to read the data (i.e. it's
quicker to read 4K than 8K).  I don't quite understand this.

Are there any specs for the NetCDF-4 format that I could read?  I'd like
to know more about how the data are compressed, and how much data
actually need to be read from disk to get a subset.

Cheers, Jon

-----Original Message-----
From: netcdf-java-bounces@xxxxxxxxxxxxxxxx
[mailto:netcdf-java-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John Caron
Sent: 15 July 2010 00:26
To: netcdf-java@xxxxxxxxxxxxxxxx
Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files

Hi Jon:

On 7/14/2010 2:51 PM, Jon Blower wrote:
Hi,

I don't know anything about how data in NetCDF files are organized,
but
intuitively, I would think that, for a general 2D array, the data at
points [j,i] and [j,i+1] would be contiguous on disk.  Is this right?
(i is the fastest-varying dimension)

yes, for variables in netcdf-3 files

I might also suppose that, for an array of size [nj,ni], that the data
at points [j,ni-1] and [j+1,0] would also be contiguous.  Is this
true?
yes, for variables in netcdf-3 files that dont use the unlimited
dimension

If so, is there a method in Java-NetCDF that would allow me to read
these two points (and only these two points) in a single operation?

netcdf-3 IOSP uses a bufferred RandomAccessFile implementation, default 8096 byte buffer, which always reads 8096 bytes at a time. the only useful optimisation is to change the buffer size.

(Background: I'm trying to improve the performance of ncWMS by
optimising how data is read from disk.  This seems to involve striking
a
balance between the number of individual read operations and the size
of
each read operation.)

Thanks,
Jon

--
Dr Jon Blower
Technical Director, Reading e-Science Centre
Environmental Systems Science Centre
University of Reading
Harry Pitt Building, 3 Earley Gate
Reading RG6 6AL. UK
Tel: +44 (0)118 378 5213
Fax: +44 (0)118 378 6413
j.d.blower@xxxxxxxxxxxxx
http://www.nerc-essc.ac.uk/People/Staff/Blower_J.htm


_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/

  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: