Re: [netcdf-java] Errors reading certain NetCDF4 data

  • To: Antonio Rodriges <antonio.rrz@xxxxxxxxx>
  • Subject: Re: [netcdf-java] Errors reading certain NetCDF4 data
  • From: Ryan May <rmay@xxxxxxxx>
  • Date: Thu, 19 Feb 2015 12:45:37 -0700
Antonio,

Sorry, I mispoke--time *should* be the last dimension, since for
C-ordering, the last dimension will vary the fastest (i.e. items along this
dimension will be sequential in memory). (I then got that crossed-up with
your chunking description, which you're correct about.)

It's possible for chunking to make up some of the performance difference,
but you're never going to be as fast as just re-ordering the data. Russ
Rew's example quoted times with chunking going from 200 seconds to 1.4
seconds; his example had about 20x the amount of times he was getting.
Given that you're quoting times of less than 1 second, I wonder if you're
just not dominated by the seek time. Certainly, since you're on an SSD, the
penalties for non-sequential access are much less than for disks.

Ryan

On Thu, Feb 19, 2015 at 11:48 AM, Antonio Rodriges <antonio.rrz@xxxxxxxxx>
wrote:

> Ryan,
>
> I do have time my first dimension (Christian suggested for time being
> the last dimension)
> and thought that after rechunking I get smth like this:
>
> 4x4 (lat and lon 2D array located continuously on disk), 4x4, 4x4,
> 4x4, ......, 4x4
> <<---------------------------- the number of rasters is 512
> ---------------------------->>
> so the distance between the different dates is not 8 kb but should be
> only 4 x 4 x sizeof(float) = 64 bytes for the expected layout
>
> Here is the metadata (although without chunk sizes, is it possible to
> look at the sizes?):
>
> netcdf
> file:/d:/RS_DATA/worker/merra_ts/tavg1_2d_slv_Nx/wind_australia_chunked/u10m/chunked/
> 2014_ch.nc
> {
>  dimensions:
>    latitude = 103;
>    longitude = 122;
>    time = UNLIMITED;   // (5088 currently)
>  variables:
>    double latitude(latitude=103);
>      :_Netcdf4Dimid = 0; // int
>      :units = "degrees_north";
>      :long_name = "Latitude";
>    double longitude(longitude=122);
>      :_Netcdf4Dimid = 1; // int
>      :units = "degrees_east";
>      :long_name = "Longitude";
>    double time(time=5088);
>      :_Netcdf4Dimid = 2; // int
>      :units = "hours since 2014-1-1 0";
>    float u10m(time=5088, latitude=103, longitude=122);
>      :comments = "Unknown1 variable comment";
>      :long_name = "Eastward wind at 10 m above displacement height";
>      :units = "m s-1";
>      :grid_name = "grid-1";
>      :grid_type = "linear";
>      :level_description = "Earth surface";
>      :time_statistic = "instantaneous";
>      :missing_value = 9.9999999E14f; // float
>
>  :Conventions = "COARDS";
>  :calendar = "standard";
>  :comments = "file created by grads using lats4d available from
> http://dao.gsfc.nasa.gov/software/grads/lats4d/";;
>  :model = "geos/das";
>  :center = "gsfc";
>  :history = "Mon Dec 01 20:20:48 2014:
> D:\\DATA\\worker\\merra_ts\\tavg1_2d_slv_Nx\\wind_australia\\u10m\\ncks.exe
> -4 --cnk_dmn lat,4 --cnk_dmn lon,4 --cnk_dmn time,512 2014.nc
> 2014_ch.nc\\nWed Oct 15 20:26:23 2014: ncrcat -v u10m -o 2014.nc";
>  :nco_openmp_thread_number = 1; // int
>  :nco_input_file_number = 212; // int
>  :NCO = "20141201";
> }
>
> 2015-02-19 21:24 GMT+03:00 Ryan May <rmay@xxxxxxxx>:
> > Antonio,
> >
> > Even with that chunk size, the number of bytes between consecutive
> points in
> > time is 512 x 4 x sizeof(float), which is 8 kb. You may get a few points
> > closer together, but they're still not close together. Any read ahead
> > function of the disk will be throwing away 99% of the data if all you
> want
> > is all the time for a single point.
> >
> > If you're predominant access pattern is all times for a single point,
> your
> > best throughput will be achieved by making sure that those points are
> > consecutive on disk, which means that you should have time be the first
> > dimension, not the last. Anything else you do will be papering over the
> core
> > problem.
> >
> > Ryan
> >
> > On Thu, Feb 19, 2015 at 10:37 AM, Antonio Rodriges <
> antonio.rrz@xxxxxxxxx>
> > wrote:
> >>
> >> Christian,
> >>
> >> According to Russ Rew
> >>
> >>
> http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters
> >> the chunking must help for my access pattern
> >>
> >> After rechunking I expected to have chunks with 512x4x4 sizes where
> >> values for the single point and different time should be stored very
> >> close on disk
> >
> >
> >
> >
> > --
> > Ryan May
> > Software Engineer
> > UCAR/Unidata
> > Boulder, CO
>



-- 
Ryan May
Software Engineer
UCAR/Unidata
Boulder, CO
  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: