[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030828: abnormal LDM termination: product-queue assertion failure



Alan,

>Date: Thu, 28 Aug 2003 07:43:38 -0400
>From: "Alan Hall" <address@hidden>
>Organization: NOAA/NCDC
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20030827: abnormal LDM termination: product-queue assertion 
>failure

The above message contained the following:

...
> > What's an NRS?
> 
> NOAAPort Receive System

Of course!  Thanks.

...
> PRC (the company that provides the NRS to the NWS and NCDC) wrote
> the original acqserver C programs and FSL provided it to me.  I'll
> include the subroutine that inserts into the product queue.

I looked the product-insertion function over and it seems benign.  It
only uses pq_open(), pq_insert(), and pq_close(), so it shouldn't
corrupt the product-queue.  (Of course, in C there's always a caveat on
"should" since one can access the entire data-segment of the process via
a pointer.)

> I've included all the subroutines in the whole package.  The putLDM.c
> is the one that actually inserts into the LDM queue.  If there are
> other subroutines that are better, I am willing to try as long as I
> can keep my filenameing convention when inserted into the ldm queue:
> 
> Aug 28 11:40:37 pqutil:      227 20030828114037.472     WMO 6456726  
> NOAAPORT.NWSTG.TEXT.SRCS40.KWAL.281139...000227
> Aug 28 11:40:37 pqutil:      233 20030828114037.481     WMO 6456727  
> NOAAPORT.NWSTG.TEXT.SXWA50.KWAL.281139...000233
> Aug 28 11:40:39 pqutil:      195 20030828114038.498     WMO 15948352  
> NOAAPORT.NWSTG.TEXT.FTUS45.KABQ.281100.RRR..000195
> Aug 28 11:40:39 pqutil:      222 20030828114038.508     WMO 15948353  
> NOAAPORT.NWSTG.TEXT.FTUS43.KMPX.281140..TAFRWF.000222
> Aug 28 11:40:39 pqutil:      359 20030828114038.667     WMO 15948354  
> NOAAPORT.NWSTG.TEXT.FTUS43.KMPX.281140..TAFMSP.000359
> Aug 28 11:40:39 pqutil:      396 20030828114038.676     WMO 15948356  
> NOAAPORT.NWSTG.TEXT.FTUS43.KDVN.281140..TAFBRL.000396
> Aug 28 11:40:39 pqutil:      220 20030828114038.686     WMO 15948357  
> NOAAPORT.NWSTG.TEXT.SRUS54.KLCH.281140..RR3LCH.000220
> 
> This gives me all that I need to identify the product: WMO header,
> callsign, date & time, anything on the same line of the header, awips
> info, and size.
...

> A SIGKILL was not sent.

Did the operating system crash while the LDM was running prior to this
problem?

> The ldm product queue was deleted and rebuilt and ldm re-started.

Good.  You should be OK for the time it takes us to investigate this
problem.  Keep an eye out, though.

Can you make the corrupt product-queue available (for example, on an FTP
server).  We have an AIX 5.1 system here and might be able to analyze
the product-queue.

Regards,
Steve Emmerson