[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20060105: reboot of yakov leads to high latencies from ECMWF (cont.)



>From:  Mike Schmidt <address@hidden>
>Organization:  UCAR/Unidata
>Keywords:  200601051054.k05AsGP1017802 TIGGE IDD latencies FC4 system tuning

Hi Mike,

re:
>Sorry for the delay getting back to you.

No worries.

(Just so you know, a number of my comments below are "for the files".)

yakov is currently running Fedora Core 4 64-bit.

uname -a
Linux yakov.unidata.ucar.edu 2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13 21:55:55 
EST 2005 x86_64 x86_64 x86_64 GNU/Linux

It is a dual 3.2 Ghz Intel Xeon EM64T platform with 4 GB of RAM.  FC4
recognizes the Xeon hyperthreading capabilities and configures itself
as if there are 4 CPUs.

>Here are the TCP tuning parameters for yakov;
>
># echo 2500000 > /proc/sys/net/core/wmem_max
># echo 2500000 > /proc/sys/net/core/rmem_max 
># echo "4096 5000000 5000000" > /proc/sys/net/ipv4/tcp_rmem
># echo "4096 65536 5000000" > /proc/sys/net/ipv4/tcp_wmem
>
>in addition, I've been starting an iperf server for testing with;
>
># iperf -s -m -w1m >> /iperf.server 2>&1 &

Thanks.  I performed all of the above as 'root' on yakov as soon as I
saw your note this morning.  I immediately did and 'ldmadmin watch' to
see if tuning would affect existing rpc.ldmd connections; it did
_NOT_.  Because of this, I restarted the LDM:

ldmadmin restart

After the restart, the latencies started dropping immediately.

>I'll add these to a startup script in the next day or so.

I just added the sequence of 'echos' to /etc/rc.local.  I am not sure
if this is the appropriate place to make the change because of the
following comment in rc.local:

----- /etc/rc.local -----
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
  ...
----- /etc/rc.local -----

If this means that an autostart of the LDM would proceed the mods, then
rc.local is _not_ the place to make the change.  The reason I say this
is I didn't see the latencies fall in existing LDM feeds from
ensemble.ecmwf.int until I restarted the LDM.  It might be the case
that the tuning steps would best be put into the LDM autostart script
(which does not yet exist on yakov).

It is _very_ interesting to note:

- without the tuning mods, yakov would only receive 2 GB/hr from
  ensemble -- lots of data was being lost.  This occurred even though
  the feed request was split 4 ways (one request each for 10, 20, 30,
  and 60 MB products).

- with the tuning mods AND a restart of the LDM, the latencies
  dropped fairly quickly

Comment: I am not sure why the volume received before tuning
was pegged at 2 GB/hr.  This bears further thought/investigation.

Given the dramatic results, we should consider:

- recommending that Manuel and Waldenio do similar things on their
  TIGGE test machines

- making the same modifications on the idd.unidata.ucar.edu cluster
  data servers (uni1, uni2, and uni4) and on the cluster collector
  frontends oliver and emo

Thanks for the tuning instructions!

Cheers,

Tom

>> From: Tom Yoksas <address@hidden>
>> Subject: 20060104: reboot of yakov leads to high latencies from ECMWF
>> 
>> >From: Unidata User Support <address@hidden>
>> >Organization: Unidata Program Center/UCAR
>> >Keywords: TIGGE IDD latencies FC4 system tuning
>> 
>> Hi Mike,
>> 
>> I don't know if you are reading email, but I rebooted yakov yesterday
>> afternoon because of some desktop weirdness I was seeing AND because a
>> new kernel had been put in /boot but was not yet being used.
>> 
>> After the reboot, the latencies for the data coming from ECMWF went
>> from about 15 seconds to an hour.  I seem to remember that you did some
>> tweeking on yakov after the last reboot, but I can't remember exactly
>> what was needed.  Can you tell me what tuning needs to be done after a
>> reboot of yakov?
>> 
>> Thanks in advance...
>> 
>> Tom
Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+