Re: [netcdfgroup] abysmal performance

To: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Subject: Re: [netcdfgroup] abysmal performance
From: Burlen Loring <bloring@xxxxxxx>
Date: Thu, 2 Jun 2016 15:03:53 -0700

perhaps the time of read of lon array is not accurate. there are quite anumber of nc calls (metadata ops) that preceded the actual read, soperhaps data gets pulled into the page cache. hard to say.

if the values are not continuous but rather scattered throughout thefile, this would certainly slow things down!! In a big file, thescattered values are less likely to end up in the page cache frommetadata ops too. To test this idea, I used ncks to change time fromunlimited to a fixed dimension and compared the results. This speedsthings up 10-20x!


This seems to reasonably explain things! Thanks for all the help!

orig
----------
min=0.23825
avg=0.316987
max=0.72832

fix_rec_dmn
----------
min=0.0111067
avg=0.0224367
max=0.0593522

speed up
----------
max=21.451
avg=14.128
min=12.2712

On 06/02/2016 01:35 PM, Dave Allured - NOAA Affiliate wrote:

Um, on more careful reading, I like Wei-keng's answer better thanmine. I missed that Burlen's time dimension size is only 8. Thissuggests that in a worst-case scenario, reading the time coordinateshould be only 8 times slower than reading the lat or lon coordinate.

So I think Wei-keng is right, and the problem is in the file cacheingsomewhere.

Burlen, you said that reading the lon array with 1152 values onlytakes 7.7e-5 sec. I wonder if you measured this on repeated readingof the same file, which may have read from a system file cache at thetime you got this measurement. Did you try measuring the average readtime for lon, across a large sample of files which had not beenrecently opened?


--Dave

On Thu, Jun 2, 2016 at 1:57 PM, Wei-keng Liao<wkliao@xxxxxxxxxxxxxxxxxxxxx <mailto:wkliao@xxxxxxxxxxxxxxxxxxxxx>>wrote:



    Reading 64 bytes at the application level does not mean only 64 bytes
    were read at the system level. Most of modern file systems perform
    client-side file caching and read ahead. The amount of read ahead
    depends on the file system setting, usually a multiple of page size.
    You can try this by reading a few data elements of a netCDF variable,
    followed by a few successive elements. I bet the second read will
    cost almost nothing, as that data is already cached in memory.

    Wei-keng

    On Jun 2, 2016, at 2:41 PM, Burlen Loring wrote:

    > Hi Tom,
    >
    > That's not an option, and it has it's own issues. for example if
    file size exceeds the size of a tape drive we can't archive it.
    Beside it doesn't seem like a lustre metadata issue, open is
    relatively fast, like 0.096 sec. and wouldn't explain why reading
    the time dimension with only 8 values takes on the order of 1 sec
    while reading the lon dimension with 1152 values takes on the
    order of 1e-4 sec. ?
    >
    > Burlen
    >
    > On 06/02/2016 12:16 PM, Tom Fogal wrote:
    >> Hi Burlen!
    >>
    >> Can you switch things to a single file?  I'm not up on NetCDF
    details,
    >> but I would expect that case to be very slow---*especially* with
    >> Lustre---using even advanced POSIX API programming.
    >>
    >> -tom
    >>
    >> On 06/02/2016 12:12 PM, Burlen Loring wrote:
    >>> I am working on a climate analysis app. it must scan the dataset,
    >>> comprised of many netcdf files to determine the available time
    steps.
    >>> I'm finding that during reading the time axis from the file the
    >>> performance is really bad. In one example the dataset is 10k
    files, each
    >>> file has 8 time values, each value is 8 bytes, so 64 bytes are
    read per
    >>> file. the time taken to read these arrays is between 0.3 and
    2.2 seconds
    >>> which puts the measured performance between 213 bytes per
    second and 29
    >>> bytes per second! I've had dial modems faster than that!
    >>>
    >>> reading other much larger arrays is much much faster, eg
    reading lon
    >>> array with 1152 values only takes 7.7e-5 sec. I don't get it.
    >>>
    >>> I'd like some advise on this. On my workstation the read times
    of time
    >>> values on the same dataset ranges between 1.9e-5 and 4.7e-5
    sec. Thus
    >>> something seems very wrong with the performance on the cray.
    btw this is
    >>> on the lustre scratch2 file system and files are 433 MB each
    and striped
    >>> across 24 ost's with a stripe size of 1MB.
    >>>
    >>> Burlen
    >>>
    >>> here is the raw data gathered from edison
    >>>
    >>> $/usr/common/graphics/teca/builds/TECA/bin/bin/teca_metadata_probe
    >>> --input_regex
    >>>
    
/scratch2/scratchdirs/prabhat/TCHero/data/cam5_1_amip_run2'\.cam2\.h2.2005-09-[0-9][0-9]-10800.nc
    <http://10800.nc>$'
    >>> scan_files = 1.1513
    >>> open_file = 0.0967366
    >>> x_axis_metadata = 7.4109e-05
    >>> y_axis_metadata = 7.732e-06
    >>> z_axis_metadata = 3.84e-07
    >>> t_axis_metadata = 5.418e-06
    >>> variables = 0.000824452
    >>> read_x_axis = 5.9976e-05
    >>> read_y_axis = 1.0444e-05
    >>> read_z_axis = 4.294e-06
    >>> read_t_axis = 0.368079
    >>> open_file = 0.122236
    >>> t_axis_metadata = 4.906e-05
    >>> read_t_axis = 0.431482
    >>> open_file = 0.0953903
    >>> t_axis_metadata = 3.4205e-05
    >>> read_t_axis = 0.3815
    >>> open_file = 0.0853393
    >>> t_axis_metadata = 2.9607e-05
    >>> read_t_axis = 0.396472
    >>> open_file = 0.0664037
    >>> t_axis_metadata = 2.8239e-05
    >>> read_t_axis = 0.351707
    >>> open_file = 0.748844
    >>> t_axis_metadata = 3.3047e-05
    >>> read_t_axis = 1.03634
    >>> open_file = 0.161732
    >>> t_axis_metadata = 2.6955e-05
    >>> read_t_axis = 0.377919
    >>> open_file = 0.0820469
    >>> t_axis_metadata = 3.1613e-05
    >>> read_t_axis = 0.363014
    >>> open_file = 0.0903407
    >>> t_axis_metadata = 3.1844e-05
    >>> read_t_axis = 0.370521
    >>> open_file = 0.092586
    >>> t_axis_metadata = 2.9547e-05
    >>> read_t_axis = 0.37146
    >>> open_file = 0.083997
    >>> t_axis_metadata = 4.0095e-05
    >>> read_t_axis = 0.396375
    >>> open_file = 0.0799897
    >>> t_axis_metadata = 2.9833e-05
    >>> read_t_axis = 0.386237
    >>> open_file = 0.105688
    >>> t_axis_metadata = 0.000124456
    >>> read_t_axis = 0.481453
    >>> open_file = 0.0816038
    >>> t_axis_metadata = 3.0969e-05
    >>> read_t_axis = 0.355051
    >>> open_file = 0.101204
    >>> t_axis_metadata = 2.5927e-05
    >>> read_t_axis = 0.408186
    >>> open_file = 0.108662
    >>> t_axis_metadata = 2.6313e-05
    >>> read_t_axis = 0.314209
    >>> open_file = 0.0916682
    >>> t_axis_metadata = 2.9697e-05
    >>> read_t_axis = 0.319686
    >>> open_file = 0.0812744
    >>> t_axis_metadata = 2.7813e-05
    >>> read_t_axis = 0.391357
    >>> open_file = 0.101898
    >>> t_axis_metadata = 2.8607e-05
    >>> read_t_axis = 0.305515
    >>> open_file = 0.0748274
    >>> t_axis_metadata = 1.7247e-05
    >>> read_t_axis = 0.69522
    >>> open_file = 0.100151
    >>> t_axis_metadata = 1.7887e-05
    >>> read_t_axis = 0.511695
    >>> open_file = 0.0700686
    >>> t_axis_metadata = 1.8225e-05
    >>> read_t_axis = 2.27155
    >>> open_file = 0.121193
    >>> t_axis_metadata = 1.9918e-05
    >>> read_t_axis = 0.497648
    >>> open_file = 0.0734922
    >>> t_axis_metadata = 2.1134e-05
    >>> read_t_axis = 1.70601
    >>> open_file = 0.312652
    >>> t_axis_metadata = 1.888e-05
    >>> read_t_axis = 0.468489
    >>> open_file = 0.0906366
    >>> t_axis_metadata = 1.9165e-05
    >>> read_t_axis = 0.348474
    >>> open_file = 0.0776435
    >>> t_axis_metadata = 2.3417e-05
    >>> read_t_axis = 0.387956
    >>> open_file = 0.157729
    >>> t_axis_metadata = 4.9765e-05
    >>> read_t_axis = 0.344205
    >>> open_file = 0.114566
    >>> t_axis_metadata = 2.8431e-05
    >>> read_t_axis = 0.374615
    >>> open_file = 0.0788776
    >>> t_axis_metadata = 2.7865e-05
    >>> read_t_axis = 0.381856
    >>> collect_t_axis = 19.47
    >>> total = 20.72
    >>> A total of 240 steps available in 30 files. Using the noleap
    calendar. Times
    >>> are specified in units of days since 1979-01-01 00:00:00. The
    available
    >>> times
    >>> range from 2005-9-1 3:0:0 (9733.12) to 2005-10-1 0:0:0 (9763).

References:
- [netcdfgroup] abysmal performance
  - From: Burlen Loring
- Re: [netcdfgroup] abysmal performance
  - From: Tom Fogal
- Re: [netcdfgroup] abysmal performance
  - From: Burlen Loring
- Re: [netcdfgroup] abysmal performance
  - From: Wei-keng Liao
- Re: [netcdfgroup] abysmal performance
  - From: Dave Allured - NOAA Affiliate

2016 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: