[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20001029: LDM 5.1.2 on solarisx86 not letting products out of queue



>From: Tom Yoksas <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200010291355.e9TDt4406706 LDM 5.1.2 queue pqact

Tom,

I just looked at the shemp ldm logs, and the last "pq_del_oldest: conflict"
message was at 7:33pm last night, about 19 hour ago:

 Oct 29 19:33:19 shemp.unidata.ucar.edu wsihcsn[7973]: pq_del_oldest: conflict 
on 1148054576
 Oct 29 19:33:19 shemp.unidata.ucar.edu wsihcsn[7973]: comings: pqe_new: 
Resource temporarily unavailable
 Oct 29 19:33:19 shemp.unidata.ucar.edu wsihcsn[7973]:        : 
7975d55ce2b4a1407ea86f0ce58a7728    11798 20001029193319.149     WSI 412  
NEX/HMO/PRE1
 Oct 29 19:33:19 shemp.unidata.ucar.edu wsihcsn[7973]: Connection reset by peer
 Oct 29 19:33:19 shemp.unidata.ucar.edu wsihcsn[7973]: Disconnect

It looks like someone shut down the LDM at 7:34pm:

 Oct 29 19:34:37 shemp.unidata.ucar.edu rpc.ldmd[7957]: child 7958 terminated 
by signal 11
 Oct 29 19:34:37 shemp.unidata.ucar.edu rpc.ldmd[7957]: Killing (SIGINT) 
process group

and then restarted it just after 8:00pm:

 Oct 29 20:00:05 shemp.unidata.ucar.edu rpc.ldmd[24961]: Starting Up (built: 
Aug 25 2000 10:53:07)
 Oct 29 20:00:05 shemp.unidata.ucar.edu motherlode[24965]: run_requester: 
Starting Up: motherlode.ucar.edu

The new more informative "pq_del_oldest: conflict" messages showed
that the products on which locks were being held were of every
feedtype, so that shoots the theory that a McIDAS decoder was holding
a lock on them.  Some of them also seem to be very recently ingested
products, which points to an error in determining which is the oldest
product.  But motherlode has been getting the same products for at
least the last 4 days without a single "pq_del_oldest: conflict"
message in its logs.  I'm beginning to wonder if shemp may be getting
a disk read error that would cause something like this.

But one other user, Tom McDermott <address@hidden>,
just reported getting a bunch of "pq_del_oldest: conflict" messages
too, so the disk read error doesn't seem that likely.

Also, why haven't we seen any of these errors on shemp since last
night???

--Russ