[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #PAU-308840]: ldm exiting



Hi Heather,

re:
> The ldm stopped unexpectedly on my noaap ingestor yesterday.  When I
> tried to restart it using "ldmadmin start" I got this message:
> 
> The writer-counter of the product-queue isn't zero.  Either a process
> has the product-queue open for writing or the queue might be corrupt.
> Terminate the process and recheck or use
> 
> pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q
> /usr/local/ldm/var/queues/ldm.pq
> 
> to validate the queue and set the writer-counter to zero.
> LDM not started

This indicates that the LDM queue got damaged somehow.  The suggested
action to take is, in fact, one of two alternatives.  The second
alternative is the best one for a NOAAPort ingest machine:
delete and remake the LDM queue:

<as 'ldm' on the machine having problems>
ldmadmin stop
ldmadmin delqueue
ldmadmin mkqueue
ldmadmin start

re:
> I rebooted my machine in an attempt to clean up the queue, but I got
> the same message again when I tried to restart the ldm.

Once the queue is damaged, reboots will have no effect; it will stay
damaged until fixed or remade.

re:
> I issued the command given in the error message:
> 
> pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q 
> /usr/local/ldm/var/queues/ldm.pq
> 
> And then I was able to restart the ldm.

OK.  For future reference: on NOAAPort ingest machines, I would simply
delete and remake the queue as per the info I included above.  It is
simpler, probably quicker and more foolproof.

re:
> Do you have any idea what may
> have happened to cause the ldm to stop?
> 
> Here is the error message in my log before the ldm stopped:
> Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> Nov 11 08:10:30 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> Nov 11 08:10:31 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: child 3284 terminated by signal 11: 
> noaaportIngester -m 224.0.1.3
> Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Killing (SIGTERM) process group
> Nov 11 16:28:35 noaapnew noaapxcd(feed)[3298] NOTE: Exiting
> Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Exiting
> Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Terminating process group

'signal 11' indicates a segmentation violation.  Why this happened is
not readily apparent.

re:
> I would appreciate any advice.

I think that the expedient thing to do is/was delete and remake the LDM
queue.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: PAU-308840
Department: Support LDM
Priority: Normal
Status: Closed