[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 19990520: Linux LDM memory problem



On Thu, 20 May 1999, Unidata Support wrote:

> 
> ------- Forwarded Message
> 
> >To: address@hidden
> >From: David Wojtowicz <address@hidden>
> >Subject: Linux LDM memory problem
> >Organization: .
> >Keywords: 199905202242.QAA15926
> 
Hiya,

David, thank you for the detail report. This problem has been noted before
but without the detail of the system behaviour.

> 
> 
> 
> Hi,
> 
> Having moved to larger Linux systems with bigger memory and using bigger 
> LDM product queues I've noticed a problematic behaviour.
> 
> Linux appears to be bufferring all the changed pages of LDM's memory
> mapped product queue file in memory and seems to be avoiding writing them
> to disk until is absolutely has to.   While this is typical of
> Linux, it seems to cause trouble with LDM.

On Linux, there needs to be almost a 1-1 size needed for ram to the size
of the LDM queue. I know that this is a huge request, but Linux memory
management has problems unless this requirement is met. I'm not a
Linux expert, maybe I'll post this question to the Linux newsgroup
to see if the memory management can be configured differently. On
the other hand, Solaris x86 doesn't have this problem, we have a system
with 128MB ram and a 550MB LDM queue and it's a stable machine.  

> 
> Suppose LDM has been running for a while but currently few new products
> are arriving.  There is little disk activity as one would expect.  However
> if you run "ldmadmin stop", then suddenly the disk activity light locks on
> solid and the disk grinds away writing data for a minute or more. 
> The LDM processes (even though a stop request has been issued) persist
> during this time, eventually going away one by one, the dissappearance
> of the last coinciding with the end of disk activity.   
> 
> It seems that all the pages of the product queue that have been modified
> are just sitting there in memory (many MB of them) and don't get written
> out until either the memory is needed for something else or the processes
> that own the pages eventually go away or call munmap.  This problem only
> seems to be worse the more memory the system has, because it can
> keep more unwritten pages around. 
> 
> Watching xosview shows that most of the free memory is being used
> for cache.  When ldmadmin stop is issued, the cache slowly
> shrinks while Page IO write peaks.

I agree with your above observations.


> 
> This would only be a minor annoyance, except that...
> 
> Because the processes persist for a minute or more after a stop is
> requested, "ldmadmin restart" doesn't work because it tries to restart too
> soon and can't complaining that LDM processes already exist.  This
> also causes problems when trying to shut down or reboot the system
> if one does not first stop ldm and wait for it to complete.

There might be a bug in ldmadmin restart. I rewrote ldmadmin stop to take
up to 30 minutes to let the LDM finish up before it signals that the LDM
is stopped. On main machines, ie thelma I always check that the LDM
processes are gone and that port 388 is not in use. Some feeder processes
are trying to complete the sending of a product before giving up and
dying.


> 
> Also because all memory not allocated to a process or the kernel
> is being used as cache, it means that whenever a new process
> is started that needs some memory, it has to wait until
> enough of the cache has been written out and cleared before
> it can get the memory it needs.  The can seriously impact interactive
> performance of a Linux machine running LDM.    Sometimes just
> typing "ls" results in a several second delay, even though
> the LDM processes are currently idle and nothing else is running
> on the system.
> 
> I realize that this is more of problem with Linux's virtual memory
> management rather than LDM, but in any case, it has significant
> negative impact on LDM's usability under Linux. 
> 
> I was wondering if Mr. Kambic or anyone else had better insight into this
> problem and if anything could be done to help alleviate it (cause the pq
> to call msync occasionally, etc.) I have tried unsuccessfully so far to
> find information on tuning Linux's memory management behavoir. 
> 
> Have experienced this problem with the following combinations:
> 
>  Linux 5.3  pqsize=200MB  ram=128MB  LDM 5.0.6   Feeds=WMO|UNIDATA|DIFAX|WSI
>  Linux 6.0  pqsize=800MB  ram=80MB   LDM 5.0.8   Feeds=NMC2
>  Linux 6.0  pqsize=800MB  ram=256MB  LDM 5.0.8   Feeds=NMC2
> 
As noted above, the machine would run better with 1-1 ram to LDM queue
size. Also, I believe your site is the first Linux machines with queue
sizes over 250MB.  Therefore, I hope the above theory holds. If you try
some more tests please let me know the results so I can inform other Linux
users.

> 
> Running a similar setup to the first line on a FreeBSD box
> results in no such problems.  Interactive performance is
> great and there is practically no delay when running ldmadmin stop.
> I'd use it, except that porting software (including LDM) is a pain.

My suggestion, use Solaris x86.  Most all packages compile without any
problems. The only different is that Linux is a little faster than Solaris
x86 but the later has many less problems.

Robb...

> 
> 
> Thanks for any info!
> 
> --------------------------------------------------------
>  David Wojtowicz, Research Programmer/Systems Manager
>  Department of Atmospheric Sciences Computer Services
>  University of Illinois at Urbana-Champaign
>  email: address@hidden  phone: (217)333-8390
> --------------------------------------------------------
> 
> 
> 
> 
> ------- End of Forwarded Message
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================