[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LDM - ldm process becomes defunct on server



Sarah,

>Date: Wed, 31 Aug 2005 13:08:22 -0600
>From: Sarah Thompson <address@hidden>
>Organization: NOAA/NWS/FSL
>To: Steve Emmerson <address@hidden>
>Subject: Re: LDM - ldm process becomes defunct on server

The above message contained the following:

> this doesn't seem like what you want....this is the command i ran
>     strace -p23317 -0 /strace.out
> am i doing somethign wrong? 

I assume the "-0 /strace.out" option was, actually, "-o /strace.out".

I think you did the right thing.  The output shows that the top-level
LDM is calling the futex(2) system call using the FUTEX_WAIT operation
(i.e., it's waiting on a mutex).  The wait is interrupted by a SIGCONT
signal, which occurs everytime a data-product is inserted into the
product-queue.  The thread then returns to whatever runtime function
made the futex(2) call (the LDM code doesn't call that function
directly) whereupon the runtime function again calls futex(2).

The problem is that the futex(2) call never returns due to a FUTEX_WAKE
operation on the given futex.

I hate to say it, but this is likely due to a bug in the operating
system.  See

    
http://www.google.com/url?sa=t&ct=res&cd=3&url=http%3A//www.uwsg.iu.edu/hypermail/linux/kernel/0503.2/1529.html&ei=kBEWQ5vLF4uc-gGBl8D6DQ

Can you upgrade the operating-system to see if the problem gets fixed?

Regards,
Steve Emmerson