Burlen,
There are some known aggravating factors that can result in extraordinarily
slow access to Netcdf coordinate variables, such as the time coordinate.
The slowdown becomes really significant with larger files such as your 433
MB files.
In general terms, if the elements of the time coordinate variable are
scattered throughout the file in many small pieces, AND the application
program attempts to read the entire time coordinate variable, then apparent
slow access is almost guaranteed. It sounds like you may have both of
these conditions together.
There are at least two ways to scatter a coordinate variable (e.g. time) in
many small pieces through a Netcdf file.
1. Netcdf-3 (classic or 64-bit offset) format, the time (coordinate)
dimension size is relatively large, and the time dimension is unlimited.
2. Netcdf-4 (nc4-classic or nc4-full) format, the time dimension size is
relatively large, the time coordinate is chunked, and the time chunk size
is much smaller than the time dimension size. For example, time chunk size
= 1, which used to be the DEFAULT in earlier Netcdf library versions and
some applications!
Does your data set have either of these characteristics?
--Dave
On Thu, Jun 2, 2016 at 1:12 PM, Burlen Loring <bloring@xxxxxxx> wrote:
> I am working on a climate analysis app. it must scan the dataset,
> comprised of many netcdf files to determine the available time steps. I'm
> finding that during reading the time axis from the file the performance is
> really bad. In one example the dataset is 10k files, each file has 8 time
> values, each value is 8 bytes, so 64 bytes are read per file. the time
> taken to read these arrays is between 0.3 and 2.2 seconds which puts the
> measured performance between 213 bytes per second and 29 bytes per second!
> I've had dial modems faster than that!
>
> reading other much larger arrays is much much faster, eg reading lon array
> with 1152 values only takes 7.7e-5 sec. I don't get it.
>
> I'd like some advise on this. On my workstation the read times of time
> values on the same dataset ranges between 1.9e-5 and 4.7e-5 sec. Thus
> something seems very wrong with the performance on the cray. btw this is on
> the lustre scratch2 file system and files are 433 MB each and striped
> across 24 ost's with a stripe size of 1MB.
>
> Burlen
>
> here is the raw data gathered from edison
>
> $/usr/common/graphics/teca/builds/TECA/bin/bin/teca_metadata_probe
> --input_regex
> /scratch2/scratchdirs/prabhat/TCHero/data/cam5_1_amip_run2'\.cam2\.h2.2005-09-[0-9][0-9]-
> 10800.nc$'
> scan_files = 1.1513
> open_file = 0.0967366
> x_axis_metadata = 7.4109e-05
> y_axis_metadata = 7.732e-06
> z_axis_metadata = 3.84e-07
> t_axis_metadata = 5.418e-06
> variables = 0.000824452
> read_x_axis = 5.9976e-05
> read_y_axis = 1.0444e-05
> read_z_axis = 4.294e-06
> read_t_axis = 0.368079
> open_file = 0.122236
> t_axis_metadata = 4.906e-05
> read_t_axis = 0.431482
> open_file = 0.0953903
> t_axis_metadata = 3.4205e-05
> read_t_axis = 0.3815
> open_file = 0.0853393
> t_axis_metadata = 2.9607e-05
> read_t_axis = 0.396472
> open_file = 0.0664037
> t_axis_metadata = 2.8239e-05
> read_t_axis = 0.351707
> open_file = 0.748844
> t_axis_metadata = 3.3047e-05
> read_t_axis = 1.03634
> open_file = 0.161732
> t_axis_metadata = 2.6955e-05
> read_t_axis = 0.377919
> open_file = 0.0820469
> t_axis_metadata = 3.1613e-05
> read_t_axis = 0.363014
> open_file = 0.0903407
> t_axis_metadata = 3.1844e-05
> read_t_axis = 0.370521
> open_file = 0.092586
> t_axis_metadata = 2.9547e-05
> read_t_axis = 0.37146
> open_file = 0.083997
> t_axis_metadata = 4.0095e-05
> read_t_axis = 0.396375
> open_file = 0.0799897
> t_axis_metadata = 2.9833e-05
> read_t_axis = 0.386237
> open_file = 0.105688
> t_axis_metadata = 0.000124456
> read_t_axis = 0.481453
> open_file = 0.0816038
> t_axis_metadata = 3.0969e-05
> read_t_axis = 0.355051
> open_file = 0.101204
> t_axis_metadata = 2.5927e-05
> read_t_axis = 0.408186
> open_file = 0.108662
> t_axis_metadata = 2.6313e-05
> read_t_axis = 0.314209
> open_file = 0.0916682
> t_axis_metadata = 2.9697e-05
> read_t_axis = 0.319686
> open_file = 0.0812744
> t_axis_metadata = 2.7813e-05
> read_t_axis = 0.391357
> open_file = 0.101898
> t_axis_metadata = 2.8607e-05
> read_t_axis = 0.305515
> open_file = 0.0748274
> t_axis_metadata = 1.7247e-05
> read_t_axis = 0.69522
> open_file = 0.100151
> t_axis_metadata = 1.7887e-05
> read_t_axis = 0.511695
> open_file = 0.0700686
> t_axis_metadata = 1.8225e-05
> read_t_axis = 2.27155
> open_file = 0.121193
> t_axis_metadata = 1.9918e-05
> read_t_axis = 0.497648
> open_file = 0.0734922
> t_axis_metadata = 2.1134e-05
> read_t_axis = 1.70601
> open_file = 0.312652
> t_axis_metadata = 1.888e-05
> read_t_axis = 0.468489
> open_file = 0.0906366
> t_axis_metadata = 1.9165e-05
> read_t_axis = 0.348474
> open_file = 0.0776435
> t_axis_metadata = 2.3417e-05
> read_t_axis = 0.387956
> open_file = 0.157729
> t_axis_metadata = 4.9765e-05
> read_t_axis = 0.344205
> open_file = 0.114566
> t_axis_metadata = 2.8431e-05
> read_t_axis = 0.374615
> open_file = 0.0788776
> t_axis_metadata = 2.7865e-05
> read_t_axis = 0.381856
> collect_t_axis = 19.47
> total = 20.72
> A total of 240 steps available in 30 files. Using the noleap calendar.
> Times
> are specified in units of days since 1979-01-01 00:00:00. The available
> times
> range from 2005-9-1 3:0:0 (9733.12) to 2005-10-1 0:0:0 (9763).
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/