[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010123: Strange LDM freezes



Pete Stamus wrote:
> 
> Hi Anne...had another freeze this morning (I gave you a call but
> you wisely were out of the office :)
> 
> [snip]
> >
> > Try sending both rpc.ldmd and pqing a simple 'kill' command, which will
> > send a SIGTERM, the normal, non-brutal termination signal that will
> > allow processes to die gracefully (if they can, indeed, die).  If that
> > doesn't work try 'kill -9'.  When you use 'kill -9' on rpc.ldmd you run
> > the risk of corrupting your queue, as the rpc.ldmd may not be able to
> > finish writing a product and die gracefully when it receives that
> > signal.
> >
> 
> I tried a plain "kill" on both processes, and that didn't do anything.
> The "kill -9" was the only way to get rid of them.  I did let the
> 'ldmadmin check' go until it returned...it didn't say there were any
> problems.  It did say that the LDM had not been restarted for the
> 9970 hours (415 days or so), which isn't right.  How does it come
> up with that number??  Nothing jumped out of the other numbers:
> 94% idle, load 0.07, 0.06, 0.06.  Did an 'ldmadmin queuecheck',
> which returned without comment.
> 
> I'm trying to figure out this fifo/named pipe stuff, and if I can
> figure it out will try having pqing read directly from the fifo(s)
> instead of the sockets.
> 
> ps
> -------------------------------------------------------------------------
> Pete Stamus                          | Phone: (303) 415-9701 x224
> Colorado Research Associates (CoRA)* | Fax:   (303) 415-9702
> 3380 Mitchell Lane                   | email: address@hidden
> Boulder, Colorado 80301  USA         | *( CoRA is a division of NWRA )
> -------------------------------------------------------------------------
>    You can't trust your eyes when your imagination is out of focus.
>                                                       -- Mark Twain
> -------------------------------------------------------------------------

Hi Pete,

After talking to a few people here I'm afraid I can't help you that
much.  I had thought that there was someone here who understood the SSEC
ingest system well, but that's not the case.  But, someone did say that
if you bought the system from SSEC you should be able to get support
from them.  Have you tried that?

Also, just for your information, here's what we're running on our ingest
machine that uses the same system:

desi.unidata.ucar.edu.ldm> ps -ef
     UID   PID  PPID  C    STIME TTY      TIME CMD
    root     0     0  0   Jan 01 ?        0:07 sched
    root     1     0  0   Jan 01 ?       17:16 /etc/init -
    root     2     0  0   Jan 01 ?        0:00 pageout
    root     3     0  3   Jan 01 ?       1610:48 fsflush
    root 23549 23535  0 14:56:19 pts/1    0:00 ps -eaf -ef
    root   140     1  0   Jan 01 ?        0:00 /usr/sbin/keyserv
    root   327     1  0   Jan 01 ?        0:02 /usr/lib/saf/sac -t 300
    root    70     1  0   Jan 01 ?        0:00
/usr/lib/devfsadm/devfseventd
    root   138     1  0   Jan 01 ?        0:00 /usr/sbin/rpcbind
    root    72     1  0   Jan 01 ?        0:00
/usr/lib/devfsadm/devfsadmd
    root  6648  6643  0   Jan 24 pts/0    0:00 csh
    root   222     1  0   Jan 01 ?        2:26 /usr/lib/inet/xntpd
    root   176     1  0   Jan 01 ?        0:00 /usr/lib/nfs/lockd
    root   185     1  0   Jan 01 ?        0:02
/usr/lib/autofs/automountd
    root   172     1  0   Jan 01 ?        0:02 /usr/sbin/inetd -s
    root  6643  6641  0   Jan 24 pts/0    0:00 -sh
  daemon   181     1  0   Jan 01 ?        0:00 /usr/lib/nfs/statd
    root   194     1  0   Jan 01 ?        1:50 /usr/sbin/syslogd
    root   242     1  0   Jan 01 ?        0:00 /usr/sbin/vold
    root   203     1  0   Jan 01 ?        0:04 /usr/sbin/cron
    root   217     1  0   Jan 01 ?        0:15 /usr/sbin/nscd
    root   260     1  9   Jan 01 ?       5975:19 /opt/nport/bin/inge
    root   262     1  0   Jan 01 ?        0:02 /usr/lib/sendmail -q15m
    root   244     1  0   Jan 01 ?        0:10 /usr/lib/utmpd
     ldm 20513 26095  0   Feb 06 ?        7:08 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
    root   328     1  0   Jan 01 console  0:00 /usr/lib/saf/ttymon -g -h
-p desi.unidata.ucar.edu console login:  -T AT386 -d 
     ldm 27151 26095  1   Feb 03 ?       29:31 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
    root   330   327  0   Jan 01 ?        0:02 /usr/lib/saf/ttymon
    root   282   260  0                   0:00 <defunct>
    root   286   260  0   Jan 01 ?        0:00 /opt/nport/bin/inge
     ldm 26096 26095  1   Jan 04 ?       220:21 pqbinstats
    root  6641   172  0   Jan 24 ?        0:00 in.rlogind
    root 23533   172  0 14:56:13 ?        0:00 in.rlogind
     ldm 26098 26095  1   Jan 04 ?       458:27 pqing -f HRS
/tmp/jmb.fifo.2
     ldm 26095     1  0   Jan 04 ?        0:01 rpc.ldmd -q
/usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
     ldm 26097 26095  0   Jan 04 ?       72:40 pqing -f IDS|DDPLUS
/tmp/jmb.fifo.1
     ldm 23535 23533  1 14:56:13 pts/1    0:00 -csh
    root 23536 23532  0 14:56:14 ?        0:00 /opt/nport/bin/inge
    root 23531     1  0 14:56:03 ?        0:00 /opt/nport/bin/inge
    root 23532 23531  0 14:56:08 ?        0:00 /opt/nport/bin/inge

Also, in /etc/init.d there is a script called ingcntl that may be used
in configuring inge.  

I still suspect that the problem is that pqing is getting a binary
character when it's expecting text.  

Since you're at a .com, I'm assuming you're not a registered
participant, and thus not officially entitled to support.  Please let me
know if this is not the case.  Usually I try to support people anyway,
but if it gets too time consuming, I have to stop.

The good news is that our system has not experienced the trouble that
you've had, so there must be a way.  I hope my small efforts have been
helpful.  I will still help with with "small" questions, if a I can. 
Good luck on this one.  

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************