[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010406: LDM Stability



>From: Richard Clark <address@hidden>
>Organization: Millersville University of Pennsylvania
>Keywords: 200104061426.f36EQ0L12638 LDM stability

Rich,

Tom here.  I will jump in and let Chiz and/or Anne back fill comments if
necessary.

>We too often have to restart our LDM, maybe about once per week according to
>Eric Horst (always, it seems over the weekend or when some interesting
>weather is really happening :-( 

This is not typical.

>Yet I remember at the UserComm meeting
>talking about Rutger's LDM and the tech didn't even know where it was or
>what version it had, implying that it also did not go down very often.

Right.

>I also recall Clint Rowe talking about how easy it is to maintain his LDM. Can
>you share your persepectives on this? Is the LDM really robust and could it
>be our LDM server that hiccups?

The LDM _is_ robust.  A couple of good examples of its stability are
motherlode.ucar.edu, and shemp.unidata.ucar.edu.  You should already be
familiar with motherlode since it is the backbone of the IDD at UCAR.
Shemp is a Solaris x86 box that has at times received ALL data feeds
(including CONDUIT) and run almost all, if not all, of the decoders that
we support.  Neither of these machines experiences problems with the
LDM.  A quick 'uptime' on motherlode shows that it has been running for
43 days and the LDM has only been stopped/restarted to allow additional
sites to feed.

Of course there is always the possibility that your LDM is "hiccuping",
and there is at least one other site that has similar problems (CCNY),
but without knowing the specifics about what is really going on, we can't
really comment about the code/installation/configuration of your LDM.

>Or do you see other problems elsewhere
>indicating that our situation is more the typical.

Your situation is _not_ the more typical.  I would say that it is in the
minority in terms of smooth running.  For interest, I jumped on the machine
running the LDM at UNCA and did an uptime.  It has also been running for
a long time, 46 days in this case).

>Could this be a
>bandwidth-related problem? I'd just like to get your thoughts on the matter.
>We shouldn't have to worry as much as we do about interruptions to the data
>flow.

We need more information about what is seen when it stops.  Are there
error messages in the log file (and/or system log file)?  Please forgive
me if this kind of information is well known to Chiz (or Anne).  I just
wanted to get a response back to you today.

Musing...  any chance your machine was hacked into again?

>Thanks

Again, I will let Chiz and/or Anne chime in with additional comments as
needed.

Tom