[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20001029: LDM 5.1.2 on solarisx86 not letting products out of queue



Mike,

> > message in its logs.  I'm beginning to wonder if shemp may be getting
> > a disk read error that would cause something like this.
> 
> System logfiles are not showing any sort of disk errors on shemp.

Good.  In looking at shemp's logs a little more carefully, I just
realized that I misinterpreted them yesterday.  I thought information
in the last line below was referring to the oldest product that was
locked, causing a conflict with a new product needing space:

 Oct 29 02:31:36 shemp.unidata.ucar.edu sysu1[8229]: pq_del_oldest: conflict on 
1332087440
 Oct 29 02:31:36 shemp.unidata.ucar.edu sysu1[8229]: comings: pqe_new: Resource 
temporarily unavailable
 Oct 29 02:31:36 shemp.unidata.ucar.edu sysu1[8229]:        : 
5029d05f67912ec4e11686606b988656    16998 20001029023136.266     WSI 1040  
NEX/SJT/VEL3/

But actually the information in the logs is for the *new* product.
The information identifying the oldest product never got printed in
the logs, because "make install_setuids" never got run on shemp, so
the version of rpc.ldmd that would print the extra info wasn't
running.

So I take back what I said about

> The new more informative "pq_del_oldest: conflict" messages showed
> that the products on which locks were being held were of every
> feedtype, so that shoots the theory that a McIDAS decoder was holding
> a lock on them.  Some of them also seem to be very recently ingested
> products, which points to an error in determining which is the oldest
> product.

I think the oldest product is being correctly identified, but
something still has a lock on it.

I will reinstall the LDM on shemp with the new more verbose info
logged whenever there is an attempt made to delete an oldest product
that is still locked, and see if we can make it happen again on
shemp, maybe by loading it down with other hosts feeding from it.

--Russ