[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #PAU-308840]: ldm exiting



Hi Heather,

re:
> Tom, thank you very much for returning my email.

No worries.

re:
> I went in stopped the ldm, deleted and made a new queue.  I
> will make sure to do this next time something happens with the queue.

Very good.

re:
> Is this at all preventable?

It shouldn't have happened in the first place.  We have been ingesting
NOAAport on a number of machines for a LONG time (several years), and
we only have experienced this problem once or twice (and never on some
machines).

What caused your NOAAPort ingest process to seg fault is a mystery;
perhaps you got a slug of bad data in the broadcast that the
process simply couldn't handle?  Again, this is a rare occurrence.


> Thanks!
> 
> Heather Kiley
> ________________________________________
> From: Unidata LDM Support [address@hidden]
> Sent: Monday, November 12, 2012 10:26 AM
> To: Kiley, Heather L (IS)
> Cc: address@hidden
> Subject: EXT :[LDM #PAU-308840]: ldm exiting
> 
> Hi Heather,
> 
> re:
> > The ldm stopped unexpectedly on my noaap ingestor yesterday.  When I
> > tried to restart it using "ldmadmin start" I got this message:
> >
> > The writer-counter of the product-queue isn't zero.  Either a process
> > has the product-queue open for writing or the queue might be corrupt.
> > Terminate the process and recheck or use
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > to validate the queue and set the writer-counter to zero.
> > LDM not started
> 
> This indicates that the LDM queue got damaged somehow.  The suggested
> action to take is, in fact, one of two alternatives.  The second
> alternative is the best one for a NOAAPort ingest machine:
> delete and remake the LDM queue:
> 
> <as 'ldm' on the machine having problems>
> ldmadmin stop
> ldmadmin delqueue
> ldmadmin mkqueue
> ldmadmin start
> 
> re:
> > I rebooted my machine in an attempt to clean up the queue, but I got
> > the same message again when I tried to restart the ldm.
> 
> Once the queue is damaged, reboots will have no effect; it will stay
> damaged until fixed or remade.
> 
> re:
> > I issued the command given in the error message:
> >
> > pqcat -l- -s -q /usr/local/ldm/var/queues/ldm.pq && pqcheck -F -q 
> > /usr/local/ldm/var/queues/ldm.pq
> >
> > And then I was able to restart the ldm.
> 
> OK.  For future reference: on NOAAPort ingest machines, I would simply
> delete and remake the queue as per the info I included above.  It is
> simpler, probably quicker and more foolproof.
> 
> re:
> > Do you have any idea what may
> > have happened to cause the ldm to stop?
> >
> > Here is the error message in my log before the ldm stopped:
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:37 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:09:44 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:05 noaapnew noaaportIngester[3282] ERROR: [GB 1]
> > Nov 11 08:10:30 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 08:10:31 noaapnew noaaportIngester[3284] ERROR: [GB 1]
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: child 3284 terminated by signal 
> > 11: noaaportIngester -m 224.0.1.3
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Killing (SIGTERM) process group
> > Nov 11 16:28:35 noaapnew noaapxcd(feed)[3298] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Exiting
> > Nov 11 16:28:35 noaapnew ldmd[3280] NOTE: Terminating process group
> 
> 'signal 11' indicates a segmentation violation.  Why this happened is
> not readily apparent.
> 
> re:
> > I would appreciate any advice.
> 
> I think that the expedient thing to do is/was delete and remake the LDM
> queue.
> 
> Cheers,
> 
> Tom
> --
> ****************************************************************************
> Unidata User Support                                    UCAR Unidata Program
> (303) 497-8642                                                 P.O. Box 3000
> address@hidden                                   Boulder, CO 80307
> ----------------------------------------------------------------------------
> Unidata HomePage                       http://www.unidata.ucar.edu
> ****************************************************************************
> 
> 
> Ticket Details
> ===================
> Ticket ID: PAU-308840
> Department: Support LDM
> Priority: Normal
> Status: Closed
> 
> 
> 

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: PAU-308840
Department: Support LDM
Priority: Normal
Status: Closed