[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20060105: LDM unexpected death and logging problems



>From: Ben Cotton <address@hidden>
>Organization: Purdue
>Keywords: 200601051759.k05Hxb7s007809 LDM 6.4.2 pqact exit

Hi Ben,

re:
>My LDM 6.4.2 build on weather.eas.purdue.edu has developed the nasty habit 
>of dying unexpectedly.  There's been no pattern that I've been able to 
>determine, except that it generally happens overnight in oder to make sure 
>I don't catch it for hours.  I've asked our department computing support 
>staff to check the system logs for anything that might be a trigger, since 
>the ldmd.log contains very little information...

We can't tell what the problem might be without being able to see
your log file(s).  We will also need specific information on your
platform (i.e., OS and version, RAM, etc.).

>only that pqact received 
>an interrupt and is exiting

What 'interrupt' (signal)?

>(and in a bit of extra fun, for some reason 
>after I manually rotated the logs - cron isn't working properly for some 
>reason, long story - the new ldmd.log file remained empty while entries 
>were being written to ldmd.log-1 ).

>Whatever is causing this premature death isn't very polite, as restarting 
>the LDM requires resetting the queue's write-counter.

>A core dump appears 
>in ~ldm at the same time as the LDM dies, and I assume the two are 
>related, but I don't know how to do anything with core files.

Running:

file core        <- substitute the actual name of the core file for 'core'

should tell you what program produced the core file.

>Our other 
>machine, wxp.eas.purdue.edu, is running 6.4.1 (although I'm building 
>6.4.4 on both this afternoon) and has never had this problem.

Is it running the same operating system?  Same version?

>I'm also noticing a what seems like a lack of information in the logs. 
>The only messages that are being written or the WARNs that a write to 
>pipe took x number of seconds.  I've checked /etc/syslog.conf , 
>~/etc/ldmadmin-pl.conf and the pqact entries in ~/etc/ldmd.conf and 
>everything points to /var/log/ldm/ldmd.log .  We put the logs there 
>instead of ~/logs (which I set as a symling to /var/log/ldm ) to skirt the 
>SELINUX issue.

OK, sounds like you are running Linux.  What version?

>Thanks,

Please provide the log file(s) and output from the 'file' command
above so we have a chance to see what may be happening.

>Ben Cotton
>==================
>Ben Cotton, KC9FYX
>LDM Administrator
>Dept of Earth & Atmos. Sci.
>Purdue University
>http://www.funnelfiasco.com
>O: (765) 49-40655
>C: (765) 586-8992

Cheers,

Tom
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.