Reading 64 bytes at the application level does not mean only 64 bytes
were read at the system level. Most of modern file systems perform
client-side file caching and read ahead. The amount of read ahead
depends on the file system setting, usually a multiple of page size.
You can try this by reading a few data elements of a netCDF variable,
followed by a few successive elements. I bet the second read will
cost almost nothing, as that data is already cached in memory.
Wei-keng
On Jun 2, 2016, at 2:41 PM, Burlen Loring wrote:
> Hi Tom,
>
> That's not an option, and it has it's own issues. for example if file size
> exceeds the size of a tape drive we can't archive it. Beside it doesn't seem
> like a lustre metadata issue, open is relatively fast, like 0.096 sec. and
> wouldn't explain why reading the time dimension with only 8 values takes on
> the order of 1 sec while reading the lon dimension with 1152 values takes on
> the order of 1e-4 sec. ?
>
> Burlen
>
> On 06/02/2016 12:16 PM, Tom Fogal wrote:
>> Hi Burlen!
>>
>> Can you switch things to a single file? I'm not up on NetCDF details,
>> but I would expect that case to be very slow---*especially* with
>> Lustre---using even advanced POSIX API programming.
>>
>> -tom
>>
>> On 06/02/2016 12:12 PM, Burlen Loring wrote:
>>> I am working on a climate analysis app. it must scan the dataset,
>>> comprised of many netcdf files to determine the available time steps.
>>> I'm finding that during reading the time axis from the file the
>>> performance is really bad. In one example the dataset is 10k files, each
>>> file has 8 time values, each value is 8 bytes, so 64 bytes are read per
>>> file. the time taken to read these arrays is between 0.3 and 2.2 seconds
>>> which puts the measured performance between 213 bytes per second and 29
>>> bytes per second! I've had dial modems faster than that!
>>>
>>> reading other much larger arrays is much much faster, eg reading lon
>>> array with 1152 values only takes 7.7e-5 sec. I don't get it.
>>>
>>> I'd like some advise on this. On my workstation the read times of time
>>> values on the same dataset ranges between 1.9e-5 and 4.7e-5 sec. Thus
>>> something seems very wrong with the performance on the cray. btw this is
>>> on the lustre scratch2 file system and files are 433 MB each and striped
>>> across 24 ost's with a stripe size of 1MB.
>>>
>>> Burlen
>>>
>>> here is the raw data gathered from edison
>>>
>>> $/usr/common/graphics/teca/builds/TECA/bin/bin/teca_metadata_probe
>>> --input_regex
>>> /scratch2/scratchdirs/prabhat/TCHero/data/cam5_1_amip_run2'\.cam2\.h2.2005-09-[0-9][0-9]-10800.nc$'
>>> scan_files = 1.1513
>>> open_file = 0.0967366
>>> x_axis_metadata = 7.4109e-05
>>> y_axis_metadata = 7.732e-06
>>> z_axis_metadata = 3.84e-07
>>> t_axis_metadata = 5.418e-06
>>> variables = 0.000824452
>>> read_x_axis = 5.9976e-05
>>> read_y_axis = 1.0444e-05
>>> read_z_axis = 4.294e-06
>>> read_t_axis = 0.368079
>>> open_file = 0.122236
>>> t_axis_metadata = 4.906e-05
>>> read_t_axis = 0.431482
>>> open_file = 0.0953903
>>> t_axis_metadata = 3.4205e-05
>>> read_t_axis = 0.3815
>>> open_file = 0.0853393
>>> t_axis_metadata = 2.9607e-05
>>> read_t_axis = 0.396472
>>> open_file = 0.0664037
>>> t_axis_metadata = 2.8239e-05
>>> read_t_axis = 0.351707
>>> open_file = 0.748844
>>> t_axis_metadata = 3.3047e-05
>>> read_t_axis = 1.03634
>>> open_file = 0.161732
>>> t_axis_metadata = 2.6955e-05
>>> read_t_axis = 0.377919
>>> open_file = 0.0820469
>>> t_axis_metadata = 3.1613e-05
>>> read_t_axis = 0.363014
>>> open_file = 0.0903407
>>> t_axis_metadata = 3.1844e-05
>>> read_t_axis = 0.370521
>>> open_file = 0.092586
>>> t_axis_metadata = 2.9547e-05
>>> read_t_axis = 0.37146
>>> open_file = 0.083997
>>> t_axis_metadata = 4.0095e-05
>>> read_t_axis = 0.396375
>>> open_file = 0.0799897
>>> t_axis_metadata = 2.9833e-05
>>> read_t_axis = 0.386237
>>> open_file = 0.105688
>>> t_axis_metadata = 0.000124456
>>> read_t_axis = 0.481453
>>> open_file = 0.0816038
>>> t_axis_metadata = 3.0969e-05
>>> read_t_axis = 0.355051
>>> open_file = 0.101204
>>> t_axis_metadata = 2.5927e-05
>>> read_t_axis = 0.408186
>>> open_file = 0.108662
>>> t_axis_metadata = 2.6313e-05
>>> read_t_axis = 0.314209
>>> open_file = 0.0916682
>>> t_axis_metadata = 2.9697e-05
>>> read_t_axis = 0.319686
>>> open_file = 0.0812744
>>> t_axis_metadata = 2.7813e-05
>>> read_t_axis = 0.391357
>>> open_file = 0.101898
>>> t_axis_metadata = 2.8607e-05
>>> read_t_axis = 0.305515
>>> open_file = 0.0748274
>>> t_axis_metadata = 1.7247e-05
>>> read_t_axis = 0.69522
>>> open_file = 0.100151
>>> t_axis_metadata = 1.7887e-05
>>> read_t_axis = 0.511695
>>> open_file = 0.0700686
>>> t_axis_metadata = 1.8225e-05
>>> read_t_axis = 2.27155
>>> open_file = 0.121193
>>> t_axis_metadata = 1.9918e-05
>>> read_t_axis = 0.497648
>>> open_file = 0.0734922
>>> t_axis_metadata = 2.1134e-05
>>> read_t_axis = 1.70601
>>> open_file = 0.312652
>>> t_axis_metadata = 1.888e-05
>>> read_t_axis = 0.468489
>>> open_file = 0.0906366
>>> t_axis_metadata = 1.9165e-05
>>> read_t_axis = 0.348474
>>> open_file = 0.0776435
>>> t_axis_metadata = 2.3417e-05
>>> read_t_axis = 0.387956
>>> open_file = 0.157729
>>> t_axis_metadata = 4.9765e-05
>>> read_t_axis = 0.344205
>>> open_file = 0.114566
>>> t_axis_metadata = 2.8431e-05
>>> read_t_axis = 0.374615
>>> open_file = 0.0788776
>>> t_axis_metadata = 2.7865e-05
>>> read_t_axis = 0.381856
>>> collect_t_axis = 19.47
>>> total = 20.72
>>> A total of 240 steps available in 30 files. Using the noleap calendar. Times
>>> are specified in units of days since 1979-01-01 00:00:00. The available
>>> times
>>> range from 2005-9-1 3:0:0 (9733.12) to 2005-10-1 0:0:0 (9763).
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web. Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> netcdfgroup mailing list
>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/