[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000118: ldm dies (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Wed, 19 Jan 2000 08:56:44 -0500 (Eastern Standard Time)
From: Thomas L. Mote <address@hidden>
To: Robb Kambic <address@hidden>
     support-ldm -- Anne Wilson <address@hidden>,
     address@hidden
Subject: Re: 20000118: ldm dies


Robb:

I tried cutting 50 or so lines off the LDM and restarting 
it. It took anywhere from 5 minutes to an hour to see it 
die. As it turns out, I started cutting from the bottom, 
and I had almost reached the top before I found the 
problem. (Murphy's Law in play I guess.) 

I eliminated the problem lines, and the LDM has been 
running fine overnight, so I'm guessing we've got the 
problem licked. I do recall that I had made changes in 
those problem lines not too long ago. I was then out of 
town and we had a network outage, so the LDM may have been 
down much of that time without any way for me to tell.

Thanks for your help.

Tom



On Tue, 18 Jan 2000 14:41:20 -0700 (MST) Robb Kambic 
<address@hidden> wrote:

> On Tue, 18 Jan 2000, Unidata Support wrote:
> 
> > 
> > ------- Forwarded Message
> > 
> > >From: "Thomas L. Mote" <address@hidden>
> > >Subject: ldm dies
> > >Organization: University of Georgia
> > >Keywords: 200001182005.NAA14016 LDM
> > 
> > --Part10001181546.A
> > Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> > 
> > 
> > Hi... I've been writing recently concerning problems with 
> > GEMPAK and McIDAS, so I suppose it was only a matter of 
> > time before I got around to some problems with the LDM ;-)
> > 
> > I think the GEMPAK problems are all fixed, and I'm close on 
> > McIDAS, but I probably do have one more e-mail in store for 
> > Tom.
> > 
> > The ldm keeps dying on me only a few minutes after being 
> > started. This seemed to start after I upgraded memory in 
> > the system and added a couple of hard disks yesterday. It 
> > was hard to tell because the network was down yesterday due 
> > to a power outage on another part of campus. 
> > (I should mention that neither of the added disks is used 
> > for the OS, the LDM or for data storage, although I 
> > eventually intend to do that. The /, /usr and /data 
> > partitions remanin unchanged.)
> > 
> > This has happened several time over the last day. I have 
> > checked the pqact.conf syntax. I have deleted and recreated 
> > the product queues. I have rebooted the system. I can't 
> > figure out the problem. I have attached a copy of the log 
> > file (in verbose mode, I believe). I can't tell from the 
> > log file what is happening. If you would like, take a look 
> > yourself.
> 
> Tom,
> 
> I logged into your machine for a while today.
> 
> The good news is I found the problem, there's a some type of bad entry in
> the pqact.conf file.  It syntacally correct but some product is causing
> pqact to crash bringing down the LDM. The bad news is that I don't know
> what pqact entry/ product is causing the problem. I would suggest doing
> some trial and error on some of the entries.  Since your pqact.conf is
> large, I would run the ldm in verbose mode and look at the log entries on
> product right before the crash.  I know this isn't a staight forward
> solution, but I don't know the products. In your log file, these would be
> the files I would be looking at,  what pqact entries do these product
> match?
> 
> Jan 18 19:52:37 cacimbo pluto[488]:      428 20000118185904.358 IDS|DDPLUS
> 834
> SAHW01 PHNL 181855
> Jan 18 19:52:37 cacimbo pluto[488]:     7530 20000118185904.374 IDS|DDPLUS
> 835
> FGUS56 KPQR 181857 /pRVFWW
> Jan 18 19:52:37 cacimbo pluto[488]:      152 20000118185904.382 IDS|DDPLUS
> 836
> FTMX63 MMMX 181600 AAA
> Jan 18 19:52:37 cacimbo pluto[488]:      135 20000118185904.394 IDS|DDPLUS
> 837
> SAIN31 VABB 181740 RRC
> 
> 
> Another plan of attack would be only use parts of the pqact.conf file to
> see what section is crashing the ldm.  Another user had this problem and
> the pqact entry was correct but the action line caused some system
> corruption.  I couldn't find the message in the archive for the details.
> Keep me informed so I can add this into the archives.
> 
> Robb...
> 
> 
> 
> > 
> > 
> > Tom
> > 
> > 
> > 
> > **********************************************************
> > Thomas L. Mote                               address@hidden
> > Associate Professor of Geography         ph:  706-542-2906
> > University of Georgia                    lab: 706-542-6060
> > Athens, GA 30602-2502 USA                fax: 706-542-2388
> >