[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #ZJY-303084]: Re: Fwd: strace of CAM I/O : performance issue



Hi Jim,

> I copied you on this stream yesterday but didn't give any explanation.¿ï½
> It seems that Richard is measuring IO performance on his cluster that
> looks suspiciously similar to what we saw when
> we were writing without the NF_NOFILL option,¿ We have verified that we
> have the option enabled, do you have any other ideas of what may be
> causing this?
>
> Jim
>
> ---------- Forwarded message ----------
> From: Hedges, Richard M. <[1]address@hidden>
> Date: Fri, Oct 14, 2011 at 10:29 AM
> Subject: Re: strace of CAM I/O : performance issue
> To: "Hedges, Richard M." <[2]address@hidden>, John Dennis
> <[3]address@hidden>, "Mirin, Art" <[4]address@hidden>, "Behlendorf,
> Brian D." <[5]address@hidden>, Jim Edwards
> <[6]address@hidden>
> Cc: "Futral, Scott" <[7]address@hidden>, "Frederick, Don"
> <[8]address@hidden>
>
> Here is a trace using the 3.6.2 version of netcdf, which shows the
> identical behavior as the 4.1.2 version of yesterday¿s trace.
>
> - Richard
>
 ...
>
> On 10/13/11 10:21 AM, "me" <[10]address@hidden> wrote:
>
>   For a while we have been looking at a performance issue with the I/O on
>   CAM runs by Art Mirin at LLNL. ¿In some cases, the performance is
>   acceptable in term of total throughput of scientific results, but the
>   actual I/O throughput is about 1 percent of what the file system is
>   capable. ¿
>
>   With these CAM runs, a single process is writing a netcdf file,
>   basically in 2 Mbyte chunks, which should perform reasonably well on
>   Lustre. ¿An analogous IOR run performs at around 500MB per second, where
>   throughput of the CAM I/O is < 10 Mbytes/sec.
>
>   Below are some excerpts from straces of the I/O for 1) the IOR analog,
>   and 2) an instance of CAM. ¿The interesting thing to me is that it
>   appears that, for CAM, there is an attempt to read 2 MB from a null
>   buffer (or your explanation of the reads) which stalls the I/O for .2
>   seconds between each 2 Mbyte transfer (and getting 0 bytes). ¿To what is
>   this attributable?? ¿ï½Is this something in the netcdf implementation?
>   ¿Is there a way of avoiding it by specification of parameters?? ¿Any
>   other thoughts?

The attempt to read 2MB is part of the netCDF internal buffering
scheme, which reads ahead.  I think it uses the same buffering scheme
code whether it is creating a new file for the first time or
overwriting data in an existing file.  This takes an insignificant
amount of time on other file systems on which we have tested, where a
seek to the end of the file (or beyond it), followed by a read, just
immediately returns 0, since there are no bytes to be read.

We've never tried to optimize the I/O for a specific filesystem
(except way back, when Cray supplied another version of posixio.c
optimized for their ffio system), but performance has been
satisfactory on POSIX file systems on which we've tested, getting
close to the maximum throughput available.  If this operation (attempt
to read from a file when at or beyond the end of the file) is slow on
the filesystem you're using, it would be good to find a portable way
to avoid it.  Alternatively, it may be a performance bug in the
filesystem implementation.  Does this only happen when using Lustre?

--Russ

>   1) This goes at 500 MB/sec (from IOR) ¿
>
>   0: 09:27:27.380596 access("/p/lscratchc/rhedges/testFile", F_OK) = -1
>   ENOENT (No such file or directory)
>   0: 09:27:27.381461 open("/p/lscratchc/rhedges/testFile", O_RDWR|O_CREAT,
>   0664) = 11
>   0: 09:27:27.384263 lseek(11, 0, SEEK_SET) ¿= 0
>   0: 09:27:27.384298 write(11,
>   "o\21\227N\0\0\0\0\10\0\0\0\0\0\0\0o\21\227N\0\0\0\0\30\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>   0: 09:27:27.386640 lseek(11, 2097152, SEEK_SET) = 2097152
>   0: 09:27:27.386671 write(11,
>   "o\21\227N\0\0\0\0\10\0\0\0\0\0\0\0o\21\227N\0\0\0\0\30\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>   0: 09:27:27.390956 lseek(11, 4194304, SEEK_SET) = 4194304
>   0: 09:27:27.390988 write(11,
>   "o\21\227N\0\0\0\0\10\0\0\0\0\0\0\0o\21\227N\0\0\0\0\30\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>   0: 09:27:27.394771 lseek(11, 6291456, SEEK_SET) = 6291456
>   0: 09:27:27.394804 write(11,
>   "o\21\227N\0\0\0\0\10\0\0\0\0\0\0\0o\21\227N\0\0\0\0\30\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>   0: 09:27:27.398618 lseek(11, 8388608, SEEK_SET) = 8388608
>   0: 09:27:27.398651 write(11,
>   "o\21\227N\0\0\0\0\10\0\0\0\0\0\0\0o\21\227N\0\0\0\0\30\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>
>   2) This goes at 5 MB/sec (from CAM):
>   It appears that it takes .2-.3 seconds to attempt reading 2 MB from a
>   null buffer? ¿Why is this in here and what software layer introduces it
>   and why. ¿
>
>   71331 1: 04:51:29.482809
>   open("[11]fvbb32_15_15_32_2_480sectcaseLnnrmh.cam2.h0.0001-09.nc",
>   O_RDWR|O_CREAT|O_TRUNC, 0666) = 69
>   ¿71332 1: 04:51:29.490999 lseek(69, 0, SEEK_CUR) ¿= 0
>   ¿71333 1: 04:51:29.491034 lseek(69, 24, SEEK_SET) = 24
>   ¿71334 1: 04:51:29.491063 write(69, "\0\0\0\0\0\0\0\0", 8) = 8
>   ¿71335 1: 04:51:29.491557 lseek(69, 0, SEEK_SET) ¿= 0
>   ¿71336 1: 04:51:29.527056 write(69,
>   "CDF\2\0\0\0\0\0\0\0\n\0\0\0\f\0\0\0\3lat\0\0\0\0`\0\0\0\3"..., 189528)
>   = 189528
>   ¿71337 1: 04:51:30.226709 lseek(69, 0, SEEK_SET) ¿= 0
>   ¿71338 1: 04:51:30.226775 write(69,
>   "CDF\2\0\0\0\0\0\0\0\n\0\0\0\f\0\0\0\3lat\0\0\0\0`\0\0\0\3"..., 2097152)
>   = 2097152
>   ¿71339 1: 04:51:30.229988 lseek(69, 4194304, SEEK_SET) = 4194304
>   ¿71340 1: 04:51:30.230026 read(69, "", 2097152) ¿ï½= 0
>   ¿71341 1: 04:51:30.530826 lseek(69, 2097152, SEEK_SET) = 2097152
>   ¿71342 1: 04:51:30.530953 write(69,
>   "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>   2097152) = 2097152
>   ¿71343 1: 04:51:30.535766 lseek(69, 6291456, SEEK_SET) = 6291456
>   ¿71344 1: 04:51:30.535844 read(69, "", 2097152) ¿ï½= 0
>   ¿71345 1: 04:51:30.830949 lseek(69, 4194304, SEEK_SET) = 4194304
>   ¿71346 1: 04:51:30.831075 write(69,
>   ")\"\353\235)#/!)\37\371\220)\0342i)\31l[)\25\225\257)\20\26E)\10\225I"...,
>   2097152) = 2097152
>   ¿71347 1: 04:51:30.835885 lseek(69, 8388608, SEEK_SET) = 8388608
>   ¿71348 1: 04:51:30.835959 read(69, "", 2097152) ¿ï½= 0
>   ¿71349 1: 04:51:31.131851 lseek(69, 6291456, SEEK_SET) = 6291456
>   ¿71350 1: 04:51:31.131978 write(69,
>   "*\f{\201*\16L\353*\0223\"*\35\205\1*3g\355*K}\255*_\177\22*u\1\7"...,
>   2097152) = 2097152
>   ¿71351 1: 04:51:31.136787 lseek(69, 10485760, SEEK_SET) = 10485760
>   ¿71352 1: 04:51:31.136862 read(69, "", 2097152) ¿ï½= 0
>   ¿71353 1: 04:51:31.492447 lseek(69, 8388608, SEEK_SET) = 8388608
>   ¿71354 1: 04:51:31.492572 write(69,
>   
> "%q\23\316%\244\230\324%\277\363!%\355\344j&\16d\271&&\364\254&A\265\236&U\16q"...,
>   2097152) = 2097152
>   ¿71355 1: 04:51:31.497342 lseek(69, 12582912, SEEK_SET) = 12582912
>   ¿71356 1: 04:51:31.497417 read(69, "", 2097152) ¿ï½= 0
>   ¿71357 1: 04:51:31.781272 lseek(69, 10485760, SEEK_SET) = 10485760
>   ¿71358 1: 04:51:31.781404 write(69,
>   
> "$\210\370\325$\214N\334$\213c\t$\210\nX$\204W\"$\217\341z$\256\17\350$\312q{"...,
>   2097152) = 2097152
>   ¿71359 1: 04:51:31.786191 lseek(69, 14680064, SEEK_SET) = 14680064
>   ¿71360 1: 04:51:31.786267 read(69, "", 2097152) ¿ï½= 0
>
>   
> ¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿ï½¿
>
>   ====================================================
>
>   Richard Hedges

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: ZJY-303084
Department: Support netCDF
Priority: Normal
Status: Closed