[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010109: ldmfail not killing McIDAS XCD processes cleanly (old message)



>From: 10 <address@hidden>
>Organization: NMSU/NSBF
>Keywords: 200101090229.f092TWo04390 LDM ldmfail

Robert,

While tidying up our mail repository I came across this old message from you:

>About a month ago I had a problem on our SPARC and Intel Solaris machines
>with ldmfail running and not cleanly restaring the McIDAS XCD decoders
>resulting in processes that could not write data.  I just saw this again
>on our Intel Solaris machine (wxmcidas.nsbf.nasa.gov).  Sometime
>recently it happened and ever since I have had no McIDAS data being
>filed.  I did an ldmadmin stop, clean and then kiled a stubborn ingebin
>process and restarted and now it works.  

Apparently, we never got back to you on this; sorry!

There are two things here:

o the failure of restarting McIDAS-XCD "stuff" when ldmfail runs
o you having to kill a "stubborn ingebin.k"

As far as the XCD stuff not getting started after an ldmfail invocation from
cron, it is likely that you have experienced what a number of other LDM users
running XCD have run into.  The problem is that things run out of cron have
a limited set of environment variables set for them.  In particular, the
PATH set for a cron-initiated execution is not one that is set in a user's
.cshrc file (C shell users) or .profile file (Bourne and possibly Korn shell
users).  What PATH will contain is "." and ~/bin.  The ~/bin is the reason
that the LDM will restart since the LDM executables are stored in ~ldm/bin.

There are two approaches to take to address this problem:

o hack ldmadmin and explicitly add the directory that contains the XCD startup
  routine xcd_run.  Presumably, setting this will also allow for findiing
  the McIDAS ROUTE PostProcess Bourne shell script, batch.k.  The problem
  with this approach is that you will have to hack ldmadmin each time you
  do an LDM upgrade

o force loading of your shell environment variables at runtime from cron.
  For Bourne/Korn shell users this would mean that you would edit your
  crontab entry for ldmadmin and prefix the execution string with
  ". .profile".  For C shell users, the job is a little different.
  I _think_ that the syntax would be something like:

  /bin/csh -c ...

  But I have not tried this to know for sure.

As to having to kill a stubborn ingebin.k process, I occasionally run into
this when restarting an LDM by hand.  I don't think, however, that this is
the problem you ran into when ldmfail runs to switch feed sites.

>>From address@hidden Mon Jan  8 19:33:23 2001
>
>wxmcidas is running Solaris 8/ldm 5.1.2/XCD 7.6 and uses Sun SC5.0 Intel
>compilers.  ldm is binary install.

The cron-initiated ldmfail problem above does not depend on operating system;
it is a result of a quirk of cron.  I am lobbying internally to have ldmfail
modified to go out and read the user's shell environment file to get the
full path set therein, and have it set PATH so that the LDM will be restarted
with the same environment when the user starts it by hand.

Sorry it took so long to dig this out of the email black hole!

Tom