[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030515: ldmadmin stop differences in LDM-6



>From: Chris Novy <address@hidden>
>Organization: SIU
>Keywords: 200305160023.h4G0NBLd007554 LDM-6 ldmadmin stop

Hi Chris,

>Since installing LDM 6.0.11 I've had trouble shutting down LDM.  I issue:
>
>   ldmadmin stop
>
>and get a bunch of messages saying the processes are 
>terminating.  Ordinarily the shutdown was almost instantaneous.

The LDM-5 version of ldmadmin did not wait until all LDM processes
had exited before returning you to the Unix prompt; LDM-6 does.
We felt that waiting until all LDM processes exit was the way that
ldmadmin should work, since we found ourselves always telling users
to "make sure that all LDM processes have exited before continuing".

The addition of waiting for all processes to exit allows one to
do things like modify the ~ldm/etc/ldmd.conf file and then stop
and restart the LDM with:

% ldmadmin stop && ldmadmin start

At this point, you can be assured that the LDM will be restarted only
after all LDM processes from the existing invocation have finished.

>An inspection of processes shows two processes still remaining:
>
>ldm  3249  3243  0 18:59:03 ?        0:00 rpc.ldmd -q /home/ldm/data/ldm.pq 
>/home/ldm/etc/ldmd.conf
>
>ldm  3243     1  0 18:59:03 ?        0:00 rpc.ldmd -q /home/ldm/data/ldm.pq 
>/home/ldm/etc/ldmd.conf

Right, those processes will eventually exit by themselves.  They are most
likely hanging around due to a slow link to an upstream or downstream
host.

>I've tried the following:
>
>   - Manually killing those processes, doing a ldmadmin clean/ldmadmin start

If you kill the processes and one or more of them are in the process of
writing to the queue, you run the risk of corrupting the queue.

>   - Deleting/rebuilding the product queue
>   - Reverting to LDM 6.0.10
>   - Reinstalling LDM 6.0.11

You shouldn't have to do any of these.

>In all cases I still can't do a clean shutdown.
>Any suggestions?

Wait for ldmadmin to return you to the Unix prompt.  If this takes an
excessively long time (like several minutes), then you might have to send
a signal to the rpc.ldmd processes that won't exit.  It should only
be the case that the signal has to be sent to the rpc.ldmd children,
and not the parent.  You can tell which is the parent and which is the
child by inspection of process IDs:

     child
      |
      V
ldm  3249  3243  0 18:59:03 ?        0:00 rpc.ldmd -q /home/ldm/data/ldm.pq 
ldm  3243     1  0 18:59:03 ?        0:00 rpc.ldmd -q /home/ldm/data/ldm.pq 
      ^       ^
      |       |
        parent

The signal that ldmadmin sends to the parent rpc.ldmd is SIGTERM, and
it, in turn, sends the signal to all of its children (including rtstats,
pqbinstats, and pqact).  If the process just won't die (due to a very
slow connection to a remote site), then I recommend sending a SIGINT
signal.  For example:

% kill -INT 3249

If this stil doesn't work (unlikely, but possible), then you might have
to send a SIGKILL:

% kill -KILL 3249

(this is the infamous 'kill -9').

Please let us know if your LDM processes really do refuse to exit.
Also, please send along what OS you are running and what compiler
you used to build the LDM.

Tom