[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Linux LDM memory problem





Hi,

Having moved to larger Linux systems with bigger memory and using bigger 
LDM product queues I've noticed a problematic behaviour.

Linux appears to be bufferring all the changed pages of LDM's memory
mapped product queue file in memory and seems to be avoiding writing them
to disk until is absolutely has to.   While this is typical of
Linux, it seems to cause trouble with LDM.

Suppose LDM has been running for a while but currently few new products
are arriving.  There is little disk activity as one would expect.  However
if you run "ldmadmin stop", then suddenly the disk activity light locks on
solid and the disk grinds away writing data for a minute or more. 
The LDM processes (even though a stop request has been issued) persist
during this time, eventually going away one by one, the dissappearance
of the last coinciding with the end of disk activity.   

It seems that all the pages of the product queue that have been modified
are just sitting there in memory (many MB of them) and don't get written
out until either the memory is needed for something else or the processes
that own the pages eventually go away or call munmap.  This problem only
seems to be worse the more memory the system has, because it can
keep more unwritten pages around. 

Watching xosview shows that most of the free memory is being used
for cache.  When ldmadmin stop is issued, the cache slowly
shrinks while Page IO write peaks.

This would only be a minor annoyance, except that...

Because the processes persist for a minute or more after a stop is
requested, "ldmadmin restart" doesn't work because it tries to restart too
soon and can't complaining that LDM processes already exist.  This
also causes problems when trying to shut down or reboot the system
if one does not first stop ldm and wait for it to complete.

Also because all memory not allocated to a process or the kernel
is being used as cache, it means that whenever a new process
is started that needs some memory, it has to wait until
enough of the cache has been written out and cleared before
it can get the memory it needs.  The can seriously impact interactive
performance of a Linux machine running LDM.    Sometimes just
typing "ls" results in a several second delay, even though
the LDM processes are currently idle and nothing else is running
on the system.

I realize that this is more of problem with Linux's virtual memory
management rather than LDM, but in any case, it has significant
negative impact on LDM's usability under Linux. 

I was wondering if Mr. Kambic or anyone else had better insight into this
problem and if anything could be done to help alleviate it (cause the pq
to call msync occasionally, etc.) I have tried unsuccessfully so far to
find information on tuning Linux's memory management behavoir. 

Have experienced this problem with the following combinations:

 Linux 5.3  pqsize=200MB  ram=128MB  LDM 5.0.6   Feeds=WMO|UNIDATA|DIFAX|WSI
 Linux 6.0  pqsize=800MB  ram=80MB   LDM 5.0.8   Feeds=NMC2
 Linux 6.0  pqsize=800MB  ram=256MB  LDM 5.0.8   Feeds=NMC2


Running a similar setup to the first line on a FreeBSD box
results in no such problems.  Interactive performance is
great and there is practically no delay when running ldmadmin stop.
I'd use it, except that porting software (including LDM) is a pain.


Thanks for any info!

--------------------------------------------------------
 David Wojtowicz, Research Programmer/Systems Manager
 Department of Atmospheric Sciences Computer Services
 University of Illinois at Urbana-Champaign
 email: address@hidden  phone: (217)333-8390
--------------------------------------------------------