I posted long ago about our LDM continuously dropping connections and
never was able to figure out the problem until the other day. Here is an
example of the errors we were receiving almost every minute, probably driving
our upstream LDM admins crazy for the past two years...:
Jan 07 05:00:06 twister pluto.met.fsu.edu[25362] NOTE: Upstream LDM-6 on
pluto.met.fsu.edu is willing to be a primary feeder
Jan 07 05:00:22 twister idd.unl.edu[25370] ERROR: readtcp(): select() timeout
on socket 4
Jan 07 05:00:22 twister idd.unl.edu[25370] ERROR: one_svc_run(): RPC layer
closed connection
Jan 07 05:00:22 twister idd.unl.edu[25370] ERROR: Disconnecting due to LDM
failure; Connection to upstream LDM closed
The problem was with our firewall, not the configuration of the LDM. There is
a tcp extension called "tcp receive window scaling" which allows the receiving
end of a network connection to dynamically adjust the size of its receiving
window (amount of data it can receive in one tcp frame). It has been gradually
implemented by various operating systems over the past few years but was not
supported by the current kernel on ouur BSD firewall so when machines on our
network implement it during a transfer the connection breaks. With the
installation of our new firewall -- and the tcp extension, we are no longer
receiving timeout errors and dropped data in our LDM logs...
Hope this helps someone down the road.
To watching the LDM in peace,
Phil Birnie
Department of Geography
The Ohio State University