LDM 6.15.0 Known Problems

Hung downstream LDM on Linux system

On some Linux systems, an ldmadmin stop command can hang because a downstream LDM process hangs. An strace(1) of the hung process indicates that it is repeatedly invoking the futex system-call:

      futex(0x583844, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
      --- SIGCONT (Continued) @ 0 (0) ---
      futex(0x583844, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
      --- SIGCONT (Continued) @ 0 (0) ---
      ...
    

This is a known Linux bug. Fixing it requires modifying either the Linux kernel or the gcc runtime library or both. The bug appears to exist in all 2.6 versions prior to and including version 2.6.13. Further information can be found by Googling "futex hang".

A workaround appears to be to set the environment variable LD_ASSUME_KERNEL to 2.4.19 before executing any LDM program. For best effect, this should be done in the LDM user's profile-file.

Product-queue access hangs in Mac OS X 10 (Darwin 8 & 9)

The product-queue module, pq(3), makes repeated use of the fcntl(2) system-function to lock portions of the product-queue. This function invokes the Mac OS X system-function fcntl$UNIX2003, which, for an unknown reason, eventually hangs (i.e., never returns). This problem has been seen on Mac OS X versions 10.5 and 10.6.

This bug was first reported to Apple on April 1, 2008. There have been only a few responses from Apple and no resolution to date.

There is no workaround.

Some data-products not processed or relayed

Processes that read the product-queue such as upstream LDM-s, pqact(1), and pqcat(1) can sometimes miss a data-product that they should have, otherwise, selected and that is actually in the queue if the system clock is not monotonic (i.e. if the system clock sometimes jumps backwards). This is because the order in which products reside in the queue is the same order as their insertion-time into the queue. Consequently, a backwards time-jump by the system clock can cause a newly-arrived data product to not be inserted at the tail of the queue and, consequently, be missed by a reader of the queue that's waiting at the queue's tail for the next product.

The solution is to run the ntpd(8) daemon to continuously adjust the clock slew rate to ensure a monotonic system clock.

Reporting Problems

If you encounter bugs or problems, please contact support-LDM at unidata.ucar.edu. Include in the email all relevant items that could aid the diagnosis.