[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010417: LDM installation at CCNY (cont.)



>From: Unidata User Support <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200104021554.f32FshL02302 LDM CCNY ldmadmin kill

Anne and Robb,

Mike and I had occasion to look at CCNY's LDM problems today (why is a
longish story).  As a reminder, Ward Hindman has problems with the LDM
occasionally dying.  His previous system administrator told him that he
needs to reboot his machine when this happens.  All of us have at some
time in the past found that 'ldmadmin stop' was not stopping the LDM,
but none of us knew why.  We all theorized that his system
administrator told him to reboot since this was the absolute easiest
way to insure that all of the LDM processes died  -- the big hammer
approach --.

Mike recalled a message that he sent to Robb and I last year (!!)
regarding the ldmadmin failure to stop the LDM; here it is:

--- Forwarded mail from <address@hidden> ("Mike Schmidt")

From: "Mike Schmidt" <address@hidden>
Date: Mon, 6 Mar 2000 11:39:28 -0700
To: yoksas, rkambic
Subject: halo ldm

I am not convinced that the hanging "ldmadmin stop" issue is a problem
with the system.  A quick look at the ldmadmin perl script shows that a
plain old "kill" against the ldm process group leader is really what is
being done.  If I do that (ie kill ldm's pid), it works fine.  I note
that currently, the ldm.pid file is there but empty.  Also, I changed
the following in the ~ldm/bin to reflect the correct hostname for this
system "halo.sci.ccny.cuny.edu";

ldmadmin:$hostname = "halo-imas.scitone.ccny.cuny.edu";
scriptconfig:HOST="halo-imas.scitone.ccny.cuny.edu"

Also, don't we recommend/require a fully qualified hostname to be the
default for the system?

mike

---End of forwarded mail from <address@hidden> ("Mike Schmidt")

We logged onto halo and started looking at why 'ldmadmin stop' was not
working.  We eventually found that the reason for the failure was the
permissions on /usr/bin/kill.  They were set to 700 when they should
have been set to 555.  We changed the permission, and then exercised
'ldmadmin stop' and 'ldmadmin start' with no problems.  At that point,
I took out all of the extra stuff in the stop_ldm routine in ldmadmin
to make it look like the version that we run on shemp.  Additional
tests of start/stop/pqactHUP all run fine now.

Anne

We had talked about you updating CCNY to LDM Version 5.1.2 sometime.
You were also going to increase the size of their queue since it was
somewhat small (200 MB).  Without updating their LDM, I increased the
queue size to 350 MB.  This should work better for them given their
slow network connection and slow processing of products in the queue.

Now I can send Ward a note telling him that all he needs to do when the
LDM dies is an 'ldmadmin stop' followed by an 'ldmadmin start'.

Just thought you'd like to know what was at the bottom of one of the
CCNY problems.

Tom