[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20060105: reboot of yakov leads to high latencies from ECMWF (cont.) address@hidden, address@hidden, address@hidden



Tom,

On Jan 5,  8:16am, Tom Yoksas wrote:
> Subject: 20060105: reboot of yakov leads to high latencies from ECMWF (con
> >From:  Mike Schmidt <address@hidden>
> >Organization:  UCAR/Unidata
> >Keywords:  200601051054.k05AsGP1017802 TIGGE IDD latencies FC4 system tuning
>
> ...
> >Here are the TCP tuning parameters for yakov;
> >
> ># echo 2500000 > /proc/sys/net/core/wmem_max
> ># echo 2500000 > /proc/sys/net/core/rmem_max
> ># echo "4096 5000000 5000000" > /proc/sys/net/ipv4/tcp_rmem
> ># echo "4096 65536 5000000" > /proc/sys/net/ipv4/tcp_wmem
> >
> >in addition, I've been starting an iperf server for testing with;
> >
> ># iperf -s -m -w1m >> /iperf.server 2>&1 &
>
> Thanks.  I performed all of the above as 'root' on yakov as soon as I
> saw your note this morning.  I immediately did and 'ldmadmin watch' to
> see if tuning would affect existing rpc.ldmd connections; it did
> _NOT_.  Because of this, I restarted the LDM:

Your observation is correct in that only TCP connections negotiated
after these parameters are modified will benefit.

> ldmadmin restart
>
> After the restart, the latencies started dropping immediately.
>
> >I'll add these to a startup script in the next day or so.
>
> I just added the sequence of 'echos' to /etc/rc.local.  I am not sure
> if this is the appropriate place to make the change because of the
> following comment in rc.local:
>
> ----- /etc/rc.local -----
> # This script will be executed *after* all the other init scripts.
> # You can put your own initialization stuff in here if you don't
> # want to do the full Sys V style init stuff.
>   ...
> ----- /etc/rc.local -----

Correct as well.  rc.local is not a good place for these changes
since all the important servers (ldm, ...) would be going by then.

> If this means that an autostart of the LDM would proceed the mods, then
> rc.local is _not_ the place to make the change.  The reason I say this
> is I didn't see the latencies fall in existing LDM feeds from
> ensemble.ecmwf.int until I restarted the LDM.  It might be the case
> that the tuning steps would best be put into the LDM autostart script
> (which does not yet exist on yakov).

I'd like to keep system tuning adjustments separate from the LDM startup.

> It is _very_ interesting to note:
>
> - without the tuning mods, yakov would only receive 2 GB/hr from
>   ensemble -- lots of data was being lost.  This occurred even though
>   the feed request was split 4 ways (one request each for 10, 20, 30,
>   and 60 MB products).
>
> - with the tuning mods AND a restart of the LDM, the latencies
>   dropped fairly quickly
>
> Comment: I am not sure why the volume received before tuning
> was pegged at 2 GB/hr.  This bears further thought/investigation.

Clearly, the tuning makes a big difference.

> Given the dramatic results, we should consider:
>
> - recommending that Manuel and Waldenio do similar things on their
>   TIGGE test machines
>
> - making the same modifications on the idd.unidata.ucar.edu cluster
>   data servers (uni1, uni2, and uni4) and on the cluster collector
>   frontends oliver and emo
>
> Thanks for the tuning instructions!

Agreed, we should share this with our friends.  At some point in the
very near future, I'd like to follow though with Dave Mitchell's
suggestions re installing a web100 kernel for a detailed analysis
of the connection issues.

mike