[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000315: LDM problems at CCNY (cont.)



>From: "Ka Kit Lai" <address@hidden>
>Organization: CCNY
>Keywords: 200003151936.MAA26729 LDM file ownership

Ka,

This is Tom responding.

>So if I put back the path to the user ldm, this problem can be solved? 

If 'xcd_run" can't be found at LDM startup it typically means that the
directory in which 'xcd_run' exists (e.g. ~ldm/decoders) has been
removed from the PATH for the user that started the LDM, or that
xcd_run has been deleted or has had the execute permission removed.

I just logged onto your system to find out which of these was the cause
and found that the problem was caused by something else.  It appears
that your LDM has at various times been started by the user 'ldm'
(good) and 'root' (very bad!!).  The LDM should never be run as 'root'!

Here is how I found this:

<login to halo as 'ldm'>
cd logs
ls -l
-rw-rw-r--   1 ldm      data         999 Mar 15 18:38 2000031522.stats
-rw-r--r--   1 root     other        999 Mar 15 19:16 2000031523.stats
-rw-rw-r--   1 ldm      data         999 Mar 15 20:00 2000031600.stats
-rw-r--r--   1 root     other        999 Mar 15 21:00 2000031601.stats
-rw-r--r--   1 root     other        999 Mar 15 22:05 2000031602.stats
-rw-r--r--   1 root     other        999 Mar 15 23:12 2000031603.stats
-rw-r--r--   1 root     other        779 Mar 16 00:57 2000031604.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 01:53 2000031605.stats
-rw-r--r--   1 root     other        999 Mar 16 03:00 2000031606.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 03:56 2000031607.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 04:51 2000031608.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 05:00 2000031609.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 06:00 2000031610.stats
-rw-r--r--   1 root     other        999 Mar 16 07:00 2000031611.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 08:13 2000031612.stats
-rw-r--r--   1 root     other        999 Mar 16 08:59 2000031613.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 11:00 2000031614.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 11:59 2000031615.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 13:00 2000031616.stats
-rw-r--r--   1 root     other        999 Mar 16 14:00 2000031617.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 14:59 2000031618.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 16:00 2000031619.stats
-rw-rw-r--   1 ldm      data         999 Mar 16 16:58 2000031620.stats
-rw-r--r--   1 root     other        999 Mar 16 17:04 2000031621.stats
-rw-r--r--   1 root     other        220 Mar 16 17:04 2000031622.stats
-rw-r--r--   1 ldm      data           0 Mar 16 16:35 ldmbinstats.upc
-rw-r--r--   1 ldm      data     48557577 Mar 16 17:04 ldmd.log
-rw-r--r--   1 ldm      data     50194998 Mar 15 18:00 ldmd.log.1
-rw-r--r--   1 ldm      data     60572049 Mar 14 17:59 ldmd.log.2
-rw-r--r--   1 root     other    3413813 Mar 13 18:00 ldmd.log.3
-rw-rw-r--   1 ldm      data        2868 Mar 13 15:34 ldmd.log.4

You can see from this list that a number of files are owned by 'ldm'
and the others are owned by 'root'.  The thing to do at this point is
to change the ownership and group of all files owned by 'root' and stop
and restart the LDM.  Before doing this, however, I decided to look
through the LDM log file, ~ldm/logs/ldmd.log.  What I see are LOTS of
instances of:

Mar 15 23:00:07 halo.sci.ccny.cuny.edu pqact[542]: pipe_dbufput: xcd_runDDS writ
e error
Mar 15 23:00:07 halo.sci.ccny.cuny.edu pqact[542]: child 18575 exited with statu
s 127
Mar 15 23:00:07 halo.sci.ccny.cuny.edu pqact[542]: child 18573 exited with statu
s 127
Mar 15 23:00:07 halo.sci.ccny.cuny.edu pqact[542]: pbuf_flush (34) write: Broken

This is telling us that 'xcd_run' is exiting prematurely, but the reason
was not readily apparent.

As the user 'ldm', I decided to stop the LDM and try restarting it.
What I found out was there was another invocations of the LDM running:
the other one was running as the user 'root'.  This was causing all
kinds of problems with the LDM log and data files and was preventing
the LDM from being restarted as the user 'ldm'.  I logged on as 'root'
and killed all the LDM processes running as 'root' and then changed the
ownership and group of all McIDAS and GEMPAK data files to 'ldm' with
the group 'data'.

I then restarted the LDM and things appear to be running correctly.

>Will that have any affect to the LDM and the data stream it get?

I now believe that your problem was caused by the two invocations of
the LDM, and the problems this cause with ownership by 'root' of many
files that should be owned by 'ldm'.

>Bythe time 
>this messages were appeared, I was login as another user, not the user ldm. 
>Why?

Were you logged in as 'root'?  If so, and if you then started the LDM, then
this would probably have caused the problems I found on your system.

Tom