[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CONDUIT latencies



On Fri, 16 Jul 1999, David Wojtowicz wrote:

> On Fri, 16 Jul 1999, David J. Knight wrote:
> > 
> > Hi David,
> >      It has been a problem for some time. The feed has
> > never been particularly reliable, but, I never bothered
> > to follow up on it since we already get much of the
> > data we need via ftp. I was just looking into the 
> > possibility of phasing the ftp out, so for the last few
> > weeks I have been looking into the NMC@ feed more 
> > closely. It was not a high priority, but, I would
> > like to get it sorted out. I'm not sure if there is
> > anything you can do at your end, but perhaps I am
> > wrong. 
> > Thanks
> > David
> 
> 
> I've been very frustrated here.   We used to just run our sole LDM
> on an ancient HP720, and although it had its share of problems
> it was OK most of the time.  Usually problems were something
> not directly related to LDM... disk full, network out, etc.
> 
> But since it was increasingly having trouble keeping up with the
> larger data streams we decided that we had better finally go
> and replace it.  Having quite sucessfully used Linux boxes for other 
> purposes, and hearing reports of other IDD sites sucessfully using it for
> LDM, (and not to mention being attracted by the cheap powerful hardware)
> we purchased new Linux boxes for IDD service.
> 
> Since then we've had nothing but trouble.  The increase in LDM
> performance was nowhere close to what was anticipated given
> the increase in hardward power (granted there are other factors
> that don't change...like networking, but still, disappointing)
> 
> I believe I traced the problem to a memory management issue
> that seems to occur when using LDM on Linux with a very
> large product queue....it had seriously impacted performance.
> Some others have noticed this too...though not everyone...I
> believe some of the end nodes running smaller product queues
> were below the threshold where it becomes a significant problem.
> 
> I believe I've come up with a solution through some changes to the code. 
> It really does make the symptoms of the initial problem go away instantly,
> but I've been concerned that it isn't causing some other problem so have
> been testing it for a while now before declaring that it is a good
> solution.
> 
> WRT to the NMC2 feed, this problem I mention was certainly a cause of
> delays in times past.  Now, though since it runs longer without crashing
> I've been experiencing a new problem.  I've noticed very recently that
> often I end up with two dozen GEMPAK dcgrib processes running even
> though I only have about a half dozen entries in the pqact.conf file.
> It seems that they are not going away and run the load very high
> causing very high latencies.    This was the case in the last 24
> hours.
> 
> To be honest, I haven't paid too close attention to the NMC2 feed
> in the last while as we mainly use it for interactive analysis
> and nobody much is around to do that this time of year.  Since
> I'm taking care of everything in our department myself right now
> I don't have time to check on each of the 40 or so larger machines
> in my care individually every day so I have to depend on others
> complaining that something's not right.  And since you didn't
> complain and I don't have someone actively using the data this
> month, I haven't been working very diligently on this.
> 
> Since you have complained I would like to get to the bottom
> of these problems once and for all and get something that
> works.   
> 
> For a test, I have turned off pqact on flood....so for now
> there should be no runaway dcgrib processes.  All the machine
> has to do is relay NMC2.  It has no other responsibilities.
> It is a 400Mhz PII with 256MB RAM, 80GB disk and 100Mbs networking

David,

There are Linux machines here at UCAR receiving NMC2 feed, Nexrad, etc.
We have found out that it's NOT possible to receive the above feeds on one
box because of the mmap memory management problems. These machine have
gigabyte of RAM and couldn't handle it. My suggestion is too put at least
750 megabytes of RAM on the box receiving the NMC2 feed. In my opinion,
RAM is cheap compared to your time. Some day the Linux programmers will
fix the problem, until then use more RAM.  

Robb...



> so it certainly should be capable of this task.  
> 
> Please watch the latencies over the next day or two.  I will take
> the time to closely do so here.   Since I remade the product
> queue just before starting this test, the latencies were
> initailly high, but now after a short while they pretty
> reasonable.  Will have to wait for the next 12 hour batch
> of stuff to know more realistically.
> 
> Sorry for the trouble.
> 
> --------------------------------------------------------
>  David Wojtowicz, Research Programmer/Systems Manager
>  Department of Atmospheric Sciences Computer Services
>  University of Illinois at Urbana-Champaign
>  email: address@hidden  phone: (217)333-8390
> --------------------------------------------------------
> 
> 
> 
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================