[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050831: clock skew on striker2? (cont.)



>From: David Knight <address@hidden>
>Organization: SUNYA
>Keywords: 200508312300.j7VN0ojo025668

Hi David and Kevin,

re: apparent clock skew on striker?

>   That is an interesting aparent periodicity. As Kevin said,
>it does not appear to be a time problem on striker2.

Since striker2 is a Sun box, I would not expect to see a
weird clock skew even if the clock was not being set at all.
In fact, if the clock was not being set, I would expect to
see a slow drift either up or down.  On a PC, however, all
bets are off 'cause the clocks basically suck!

>So I thought maybe it might be associated with web
>traffic, or load, since striker2 is also our
>web server. But, looking more closely at the latency
>plots it appears that the period of variation is
>different at each site. Doing fft by eye, it appears
>the period at UNF is about 6 hours, but at Unidata
>it is around 3-4 hours.

Hmm...  We didn't look _that_ closely at the differences between
the plots; good eye!  It is _very_ odd that the periodicity
for the same apparent pattern would vary from site to site.

>Unidata latencies drop dramatically
>at about 0z on the 31st, while UNF latencies are on
>an upward ramp at that time. I checked a few other
>sites (brockport and oswego), and a similar periodicity in latencies
>occurs at those sites, but, again they do not appear
>to be well correlated.

This is, at least, and interesting mystery ;-)

>    Strange behaviour. Not sure what it could be. Maybe
>somekind of periodic like behaviour on the internet.

I don't think so...

>Maybe some kind of periodic behaviour in the LDM.

We don't see this behaviour in other feeds at the same sites where we
see the NLDN weirdness.  We only see the step function effect in the
NLDN feed from striker2.  That is the reason I sent you the email.

>Is it
>possible when feeding many sites that the ldm could somehow
>change the priority given to data sent to a site?

No, it shouldn't.  The data back ends for the IDD top level
relay cluster we run (idd.unidata.ucar.edu) do not show this
kind of effect, and one of them is occasionally feeding over
185 downstream connections.

>It is interesting that for all the sites once the latency hits
>30 seconds it drops off dramatically.

Absolutely.  That is what is _so_ weird.  The first idea we had
was that your NLDN insertion process is running outside of the
LDM process group.  In a case like that, the downstream rpc.ldmds
work in a cycle:

- each rpc.ldmd wakes up every 30 seconds and sends all data in the queue
  that needs to be sent to the downstream

- goes back to sleep for 30 seconds

This does not fit your pattern, however, since the indicated 30
second latency is occurring over a time period of several hours,
not seconds.

>But again, it appears
>to me that the time it drops of dramatically varies from site
>to site. It might be interesting to ping a couple of
>sites and see if the ping times show similar variability.

Good idea.

>(i'll run and log the pings, if you make the plots and
>compare them to the latencies)

Sounds great, thanks.

>   I'm open to suggestions. Before you ask, striker2 is running
>ldm-6.0.14. Yes I know it is old... but until now I thought it
>was working just fine so saw no need to upgrade.

Yea, we saw that you are running an old LDM on striker, but we
do not think that this has anything to do with the latencies
we are seeing.

By the way, this pattern is relatively new.  I say this because
I routinely look at NLDN ingest on machines were we suspect some
sort of artificial volume limiting (packet shaping).  The classic
signature for volume limiting is a system that shows near zero
latency for low volume feeds (like the IDS|DDPLUS or NLDN) and high
latencies for higher volume feeds (e.g., HDS, NIMAGE, NNEXRAD, etc.).

>   Hey, at least the latencies are mostly staying under
>a minute ;-)

Yes, we were't as concerned about the actual latency values as were
were intrigued by the latency pattern.  Again, at least this is
interesting :-)

Cheers,

Tom
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

>From address@hidden  Wed Aug 31 17:29:27 2005

p.s. it is interesting to look at the latencies from gusher
which is on the same local network as striker2.
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NLDN+gusher.atmos.albany.edu

I'm not sure what this tells us, but, it is different...
(gusher is probably somewhat overloaded these days)