[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000102: ldm-5.1.2



Luis Cano wrote:
> 
> Hello Anne,
> 
> Thanks for the reply.
> 
> The servers are new machines that will be brought  into operations soon. 
> Couple of
> weeks ago, we started with an older version of the ldm and had similar 
> problems.
> Since then,  the kernel on one of the servers was upgraded to 2.2.16, and the 
> ldm
> was upgraded to 5.1.2 on both servers. I did not install the LDM, but I am 
> fairly
> certain that it was a binary install.
> 
> Also, this problem does not happen right away. The LDM will only run for a 
> couple of
> days, less than a week. To recovery, we have to delete the queue and recreate 
> it. So
> we have deleted the queue a couple of times since upgrading the ldm.
> 
> I poked around in the Unidata mail archives and saw that this type of problem 
> was
> reported previously. I did not save the email, but the email thread basically
> considered this problem a kernel issue  -- which could very well be the 
> problem. In
> the email thread, the kernel was upgraded to 2.2.16 and the problem was 
> reported
> fix. However, since the problem does not happen immediately and the exact 
> nature of
> the problem is not known, I would that the ldm would need to run for a number 
> of
> weeks before considered fixed.
> 
> I'm in the process of upgrading the drivers, in attempt to eliminate driver 
> issues
> that may be impacting the kernel. Also, I am going to recompile the ldm with 
> the -g
> switch so I can analyze a dump if the ldm cores.
> 
> Do you have any other suggestions?
> 
> Thanks and appreciate the help.
> 
> Lou
> 

Hi Lou,

Well, I'm kinda stumped on this one.  Now I'm wondering about the RAID
disk after all.  In an earlier message I said I didn't know of any
problems with a RAID disk.  We have used a RAID disk successfully under
Solaris, although we did not keep the queue there - only the data.  

I saw the email in the archives about the kernal upgrade
(http://www.unidata.ucar.edu/cgi-bin/mfs/65/3878?96#mfs).  I agree -
it's not clear whether the problem was fixed for good or not, but since
we did not hear back from them it may have fixed the problem.  Thus, it
seems to make sense to upgrade the OS and the drivers.  What has
happened since you upgraded the one to 2.2.16?

From your previous email, I gather that both of your machines have only
RAID disks, is that right?  Thus, the product queue must be on a RAID
disk.  Something else to try would be to add a non-RAID disk and put the
queue there, if possible. 

I wish I could be more helpful.  Let me know what you find out.  In the
meantime I'll make some inquiries and let you know if I get any more
ideas.

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************