[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010531: ldm 5.1.3 with RH 7.1 thrashing



Hi Art,

"Arthur A. Person" wrote:
> 
> On Thu, 31 May 2001, anne wrote:
> 
> 
> Actually, it's just a Pentium III 833mhz with embedded SCSI and NIC.  It's
> fairly vanilla as PC's go...  I just loaded RedHat on it with the default
> full install and started running it (security hardened, of course).  I
> think any similar Pentium III PC you might have should behave similarly
> unless SCSI was an issue.  So, I would be running 32-bit, right?  I'm not
> familiar with 32 vs. 64 bit installs.
> 

Yes, you would be running a 32bit version.  But this makes things
easier.

> > We can build 64bit versions of the LDM on our SPARCv9 or IRIX64
> > machines.  If I here from you that you're running a 64bit version, I
> > will do this and request WSI data and see what happens.
> 
> It does seem that when I kill of the wsi rpc's that the system becomes
> more responsive, but it still thrashes.  I just stopped and restarted the
> ldm without rebooting and it actually worked, and the data seem to be
> slowly catching up, except NEXRAD seems to be lagging still.  I also made
> a new queue of 600mb to hopefully prevent the problem for overnight.
> 

When you say "it still thrashes", do you mean that products aren't being
received in a timely manner?  Right now products on ldm.meteo appear to
be arriving pretty quickly.  And, 'top' is showing a low load average,
the machine appears to be responsive, and there's a reasonable number of
rpc.ldmds...  Is this all with your 600Mb queue?



> I guess I'm not sure yet where to point the finger at this problem...
> maybe it's not the wsi connection, maybe the wsi connection is a symptom
> of slowness and it times out and reconnects.
> 

But, it shouldn't leave processes lying around.   I don't yet know where
to point the finger either..

At least a few sites are running 7.1 without any apparent problems.  I
know at least one site to ask - Gilbert's running 7.1 and I think he's
also getting data from WSI, but I'm not positive...

> I think maybe I'd better run this with a small (600m) queue for now and
> see if the problem recurs since I leave for vacation next Friday and I
> don't want to leave an unstable system behind.  Will you do any testing on
> this or will we wait until I return mid-month?
> 
>                                   Art.

Let me know how it goes with your 600Mb queue - I'll be really
interested to know if that made a difference. 

I will try to do some testing.  I will ask WSI to feed our RH7.1 pc for
a while for debugging purposes.  I don't know if they'll agree or
not...  For that matter, I don't know how quickly they will even
respond.  

I will also be away for a week starting next Wednesday.  I hope you can
find a way to get by - this looks like it may take a while.   If I can't
duplicate the problem...  it could take a LONG while.

I'll keep you posted.

Anne 
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************