[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030331: LDM - Linux RedHat 7.3 - pqinsert/pq_del_oldes



Karen,

> To: address@hidden
> From: "Karen Cooper" <address@hidden>
> Subject: LDM - Linux RedHat 7.3 - pqinsert/pq_del_oldest
> Organization: NOAA/NSSL
> Keywords: 200303311517.h2VFHUEX003415 LDM-5.1.4 pq_del_oldest

The above message contained the following:

> Institution:  /nssl.noaa.gov
> Package Version: 5.1.4
> Operating System: Linux RedHat 7.3
> Hardware Information: ASL PC
> Inquiry: I'm running a program that ingests Level II radar data into
> the LDM queue. I have been running it on many RedHat Linux machines for
> more than a year, and I have never seen the following problem.
> 
> The process that is ingesting the data and inserting it into the queue
> routinely fails after a few days. This is only happening on one of my
> many machines.
> 
> The ldmd.log shows:
> 
> Mar 29 11:04:01 twxldm pqing_bdds[3860]: pq_del_oldest: signature 
> c90b2ff0a1543327aa9505e8a74cf251: Not Found
> Mar 29 11:04:01 twxldm pqing_bdds[3860]: pq_insert: Invalid argument
> Mar 29 11:04:01 twxldm pqing_bdds[3860]: Exiting
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   Queue usage (bytes):100003840
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:            (nregions):    8055
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   Duplicates rejected:       0
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   WMO Messages seen:         0
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   SOH/ETX missing  :         0
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   parity/chksum err:         0
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   WMO format errors:         0
> Mar 29 11:04:01 twxldm pqing_bdds[3860]:   FILE Bytes read:    2450639464
> 
> 
> I was hoping you might be able to give me some insight into the
> problem. I am going to install LDM 6.0.2 to see if that helps, but I am
> very curious as to the root of the problem.

I'm afraid that I don't know what the problem is.  I saw the problem
here and tried to fix it but was unsuccessful.  In fixing some other
problems, however, it seems that the problem in question might also have
been fixed.  We've been running LDM-6 on a Linux system and have yet to
encounter this problem.

One possible reason for the problem was that the order of signal
blocking and unblocking in critical sections of the product-queue
module was slightly incorrect and could have led to violations of the
product-queue's invarients in the face of signals. We haven't seen the
problem you mention since fixing the code for this other reason.

Please let me know if you encounter this problem with LDM-6.

Regards,
Steve Emmerson