[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021018: Proftomd hanging on RedHat 8.0 (cont.)



>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200210050242.g952g0127088 ldm-mcidas proftomd

Gilbert,

re: setting a data monitor inactive

>OK. I didn't know if you wanted that done, since I thought you were trying 
>to see what was up with it.

The test I am running is the following:

o I uncommented the startup of XCD routines in ~ldm/ldmd.conf

o logged in as 'mcidas', I disabled the running of the synoptic/ship/buoy
  decoder dmsyn.k

o I created the directory ~mcidas/workdata/test

o copied DECINFO.DAT from ~mcidas/workdata to ~mcidas/workdata/test

o changed MCPATH for 'mcidas' from the command line to add ~mcidas/workdat/test
  to the front:

MCPATH=/home/mcidas/workdata/test:/home/mcidas/workdata:/home/mcidas/data:/home/mcidas/help

o cd to ~mcidas/workdata/test

o start a McIDAS enviornment:

  mcenv

o in this environment, I turn on the synoptic/ship/buoy decoder:

  decinfo.k SET DMSYN ACTIVE

  This does not affect the copy of DECINFO.DAT that is used by the XCD
  supervisory routine startxcd.k (that is started upon LDM startup
  form the 'exec        xcd_run MONITOR' invocation in ~ldm/etc/ldmd.conf)

At this point, I can run the synoptic/ship/buoy decoder by hand.  In order
to setup an environment in which I can cause a core file to be dumped
(McIDAS turns off creation of core files by default), I have to do
a couple of things within the McIDAS environment I created with mcenv:

ucu.k POKE 142 0           <- tell McIDAS to enable core dumps
unlimit coredumpsize       <- tell Linux to enable core dumps

Now, I can run the decoder by hand AND cause a core file to be dumped
if/when it goes into its infinite loop:

dmsyn.k RESTART=-1 DEV=CCC

Phew!

re: how to see which XCD data monitors are active

>Yep. OK...any ideas?

Not yet.  I am hopeful that the copy of dmsyn.k that I created with
the '-g' flag set for compilation (of m0syndec.for, m0shpdec.for, and
dmsyn.pgm) will provide a core dump that will tell me where the decoder
gets into an infinite loop.  Once I have that information, I can examine
the code and see what needs to be bulletproofed.

>The new kernel is in. Oh, interestingly, it is NOT 
>doing it on weather.admin.

Very weird given that both weather and weather2 are both running RH 8.0!
I see that you commented out the execution of proftomd on weather.  Does
this mean that it was hanging there also?

>I betcha RedHat comes out with a new Glibc 
>soon...customers are pretty ticked off. Let's see if that fixes it.

The problem with proftomd really does seem to be related to one of the
glibc shared libraries.  The reason I can say this is that you were
using a binary version of proftomd built on RH 7.1.  That version of
proftomd is running on several RH 7.[0123] systems with no problems.
Also, where the program goes into an infinite loop is outside of any
particular call.  The only thing I did (the hack/kludge) was to have it
not try to update the McIDAS routing table with the information that a
new set of data had been received and decoded.

I ran strace on proftomd but nothing was revealed.  I examined proftomd
routines to make sure that no arrays were being overflowed, or pointers
blown -- nothing.  The kludge was only made to get things working.

Tom