[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010418: LDM installation at CCNY (cont.)



Hiya,

Anne asked if I would look at halo in her absence.  Here's what I found
last week:

- CCNY is running an older version of the LDM, before the new queue
structure.

- The machine hangs because it is trying to receive "too" much data with
the current configuration.

- Actual abnormal LDM exits occurred when:

        -  pqexpire is running and pbuf_flush messages are being emitted
                ( there must be some contention when pqexpire is trying to 
                delete products and pqact is writing products, could it
                possibly be the same product? )

        - pbuf_flush messages and RE-CLASS messages are being emitted.
                ( contention between pqact and the receiver process)


These are speculations of course, but if halo could run the new version of
the LDM, then the first action would be eliminated and possibly the second
one too.  


This was the information was derived from the logs, I thought I saved the
actual logs but I didn't. The only log information is short excert from
after a RE-CLASS message where the LDM exits, I'll attach it.

There were many pbuf_flush messages and RE-CLASS messages in the logs.  It
appears a slow disk problem and a narrow pipe problem exists,  maybe the
configurations need to be changed.

 Robb...


===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================
Apr 08 13:14:58 halo.sci.ccny.cuny.edu redwood[20174]: FEEDME(redwood.atmos.alba
ny.edu): reclass: 20010408130958.110 TS_ENDT {{IDS|HDS|DDPLUS,  ".*"}
Apr 08 13:14:58 halo.sci.ccny.cuny.edu redwood[20174]: assertion "pIf(xdrs->x_op
 == XDR_ENCODE, *cpp != NULL && **cpp != 0)" failed: file "ldm_xdr.c"
Apr 08 13:15:04 halo.sci.ccny.cuny.edu rpc.ldmd[20168]: child 20174 terminated b
y signal 6
Apr 08 13:15:04 halo.sci.ccny.cuny.edu rpc.ldmd[20168]: Killing (SIGINT) process
 group
Apr 08 13:15:04 halo.sci.ccny.cuny.edu rpc.ldmd[20168]: Interrupt
Apr 08 13:15:04 halo.sci.ccny.cuny.edu rpc.ldmd[20168]: Exiting
Apr 08 13:15:05 halo.sci.ccny.cuny.edu 169.226.4.37[20176]: Interrupt
Apr 08 13:15:05 halo.sci.ccny.cuny.edu pqact[20171]: Interrupt
Apr 08 13:15:05 halo.sci.ccny.cuny.edu pqact[20171]: Exiting
Apr 08 13:15:05 halo.sci.ccny.cuny.edu pqbinstats[20170]: Interrupt
Apr 08 13:15:05 halo.sci.ccny.cuny.edu DCSYNOP[14581]: Interrupt Signal
Apr 08 13:15:05 halo.sci.ccny.cuny.edu 169.226.4.37[20176]: Exiting
Apr 08 13:15:05 halo.sci.ccny.cuny.edu pqexpire[20169]: Interrupt
Apr 08 13:15:05 halo.sci.ccny.cuny.edu DCUAIR[14580]: Interrupt Signal
Apr 08 13:15:05 halo.sci.ccny.cuny.edu 169.226.4.58[20179]: Interrupt
Apr 08 13:15:05 halo.sci.ccny.cuny.edu DCSYNOP[14719]: Interrupt Signal
Apr 08 13:15:05 halo.sci.ccny.cuny.edu 169.226.4.58[20179]: Exiting
Apr 08 13:15:05 halo.sci.ccny.cuny.edu DCHRLY[14377]: Interrupt Signal