[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010202: ldm gremlins loose (cont.)



"Jennie L. Moody" wrote:
> 
> Anne,
> 
> I noticed that things were messed up again on windfall,
> but you were on by the time I came into work and I
> had to deal with some other stuff.  I just looked at
> the fact that you had restarted the ldm, and it looks
> good so far.  Seems that the issue was again related to
> PennState dropping me, the system doing a failover,
> then failing back to navier, and having problems.
> 

Hi Jennie,

Yes, I figured out the problem and made a temporary fix.  When the
failover occurs via cron the ldm is restarted in a different
environment.  In particular, it's not getting the PATH variable set
correctly, so it's unable to find things it needs.  The not so
satisfactory fix I made is to include your entire path in the ldmfail
script.  The problem with this is that you'll need to modify the script
every time you upgrade.  Instead, I would prefer to change how cron
invokes the script.  That way, your upgrades will be straightforward. 
But, I'll have to do some research on how to accomplish this.


> I should point out, I don't believe that automatic failovers
> were ever working under the old account (ldma), in fact
> I thought this was one of the new features of this version
> of the ldm?
> 

This is interesting!  I was really trying to understand why it was
working before - an assumption on my part.   

I wouldn't say ldmfail is a new feature of this version.  It's been
around since at least 5.0.9, and probably before.  But, this version is
different than the old one you have.  The old one was written by Mitch,
and the latest one was redone by Robb.

> I found a few problems with files that get written by
> some of my product scripts, these were gif files that
> were still owned by ldma, and while they were in the
> same group as ldma (I think it looks like you left ldma
> in a group with ldm and mcidas), several files had
> only read-permission for the group....so, ldm fired
> batches that tried to write out gif files were not
> completing successfully.  Anyway, I changed all these
> (they were all in our /home/mcuser/webpage directory).
> This doesn't impact anything else, but it explains to
> me why some products on our website were updating
> while others were not.
> 

Great!  I'm glad you figured this out.  It's difficult for me to fully
understand what a user is trying to do with their data without spending
lots of time.  

> I have to head home now, but I hope that whatever the
> failover gremlin is, you don't have to work to hard to
> find him.
> 

I feel better about at least diagnosing the problem and having a patch,
although it's not the greatest solution.   I hope we're on the home
stretch.

> As always, I appreciate that you are taking note and
> assisting!
> 
> Jennie


It is my pleasure to be able to be of assistance!

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************