[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010822: Unable to maintain connects from wsi (fwd)



>To: address@hidden
>From: "Arthur A. Person" <address@hidden>
>Subject: Re: 20010822: Unable to maintain connects from wsi
>Organization: Penn State
>Keywords: 200108221513.f7MFDR125002

Art,

> > But if you still have the old queue available, could you possibly do
> > me a favor by sending me the output of a couple of additional checks
> > for queue corruption?
> >
> > Assuming the old product queue is in a file named "old.pq", the first
> > test is getting the output from pqmon:
 ...
> Okay, mine shows the following:
> 
> Aug 28 20:32:05 pqmon: Starting Up (16169)
> Aug 28 20:32:06 pqmon: nprods nfree  nempty      nbytes  maxprods  maxfree  
> minempty    maxext  age
> Aug 28 20:32:06 pqmon:  71267     4   82329   629121336     98072     2357    
>  55525     17256 381426
> Aug 28 20:32:06 pqmon: Exiting

This looks good, and shows no symptoms of a corrupt queue.

> > The second thing I'd like to see is pqcat's idea of how many products
> > are in the queue, and how long it takes to go through all these.
 ...
> My pqcat command shows:
> 
> [ldm@ldm ~/data]$ pqcat -q ldm.pq.old > /dev/null
> Aug 28 20:34:00 pqcat: Starting Up (16175)
> Aug 28 20:35:32 pqcat: Exiting
> Aug 28 20:35:32 pqcat: Number of products 71267

This agrees with pqmon on the number of products, and has counted them
independently by actually reading through every product in the queue,
following all of the data structures and lists.  The fact that neither
pqcat nor pqmon exited with an assertion violation and that pqcat got
through all the products and exited normally makes it look very
unlikely that you had a corrupt queue.  I think the problem must have
been elsewhere ...

> The queue size is ~600MB and apparently was made on June 1, 2001.  It must
> have run from then until now before having a problem.  Actually, I think
> the last time I remade the queue was when I was pursuing a problem with
> Anne relating to the system going into an I/O thrashing state which seemed
> to be related to queues made near the maximum size limit of 2GB.  I
> haven't seen the problem with smaller queues.  The one I'm running now is
> ~1GB.  I'm running ldm version 5.1.3 on RH 7.1 with all this.  I guess the
> mystery to me is why the wsi feed would fail because of the queue.  Maybe
> I should still the old one back in again and see if it still fails.

It's a mystery to me also, since there should be no dependence on feed
type or product contents.

--Russ