[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010516: 20010514: 20010507: 20010507: oabsnd and swap space



Chris,

I logged in and took a look at your system. I couldn't run "sar" on
your system.  You didn't have "top" that I could find so I copied one over:

----
load averages:  1.50,  0.77,  0.50                14:36:58
95 processes:  90 sleeping, 3 running, 1 zombie, 1 on cpu
CPU states: 39.4% idle, 36.5% user,  9.2% kernel, 14.9% iowait,  0.0% swap
Memory: 512M real, 13M free, 250M swap in use, 125M swap free
---

It appears that it really is a lack of system resources that is causing the
problem of forking the process. Some of the problem could possibly be
alleviated with fewer httpd processes. You could probably stand to add
some sawp space since you only have about 375MB available with 512MB of RAM.

In GEMPAK 5.4, the maximum grid size was 100,000 grid points. I increased this
size to 400,000 in GEMPAK 5.6 to enable the use of several of
NCEP's larger grids. 

Since oabsnd (about 90M of space is allocated on launching) probably uses up
the available free memory, the forking of gplt fails.

We have 3 options:

1) add more swap and see if this helps. You can do this with "mkfile" and
adding it to swap with "swap -a". If you have the disk space, adding
128MB would get you closer to the 1 to 1 ratio.

2) recompile GEMPAK 5.6 with the smaller grid sizes defined so that you have the
same array sizes as when you were happy with GEMPAK 5.4.

3) I was able to get the upperair.csh script to somewhat work around the problem
by forcing gplt to be launched before running oabsnd. This worked better,
but still ran into problems when McIDAS was doing imgremap.k.


Here is what I tried:

I created a directory under $GEMDATA/tmp/upperair.chiz and coppied your
upperair.csh script to that directory for tinkering. I changed some paths
in the script so as not to overwrite your grids in $HOME, and to
use the upperair.chiz directory.

Following the gddelt section you have, I added:
# Lets just launch a program to get gplt fired up. Use this one
# gplt for the entire script.
echo " "
echo "get gplt launched....."
gpmap << GPLT

   e
GPLT



Launching gpmap gets gplt running.  Now, you won't have to worry about
forking gplt in oabsnd, since it is already running. You can still see:
"Killed" when trying to launch oabsnd if you don't have enough memory.


I also removed the individual gpend commands you had in the script
since you only need it after you are finished with all your oabsnd invocations.
Since it takes time to fork the gplt process, you are better of only 
having to start it up once.

I left the upperair.chiz directory (with "top" there too) for you.

One other thing, in your .cshrc, your source Gemenviron, then you set
you path. Since Gemenviron will add GEMEXE and SCRIPTS_EXE to your path,
it is better to set your path first (without hardcoding the gempak
binary directory in it, and then sourcing Gemenviron. Since you were not
adding the SCRIPTS_EXE directory into the PATH, you were overriding the PATH
and preventing scripts like "cleanup" from working.

Running "cleanup -c" will remove the message queues and kill off the 
gplt and parrent processes for a user.

Steve Chiswell
Unidata User Support






 
>From: address@hidden (Chris Hennon)
>Organization: UCAR/Unidata
>Keywords: 200105161652.f4GGqdp13789

>Steve -
>
>I apologize for taking up so much of your time.  I'll understand if you
>have other things to take care of.
>
>I've been working with one specific script which I am running by itself.
>Hopefully, this specific example will yield some useful information.  I
>was wondering if you could login to my machine and take a look.  The
>script is located in:
>
>/usr/local/gempak/scripts/upperair/upperair.csh
>
>Basically, it runs oabsnd multiple times to create upperair grids, then
>calls a variety of other scripts that produce upperair plots.  When this
>script completes, it leaves behind several message queues:
>
>ipcs -pt
>IPC status from <running system> as of Wed May 16 11:15:48 EDT 2001
>T     ID      KEY      MODE         OWNER   GROUP LSPID LRPID STIME
>STIME    RTIME    CTIME
>Message Queues:
>q   2951   0x4b3fb75  --rw-rw-rw-   gempak   ldm  4181   0  23:30:03
>q    102   0x4b3fbe4  --rw-rw-rw-   gempak   ldm  4292   0  23:30:11
>q    103   0x4b3fc28  --rw-rw-rw-   gempak   ldm  4360   0  23:30:20
>q    104   0x4b3fca4  --rw-rw-rw-   gempak   ldm  4484   0  23:30:28
>q    105   0x4b3fd0d  --rw-rw-rw-   gempak   ldm  4589   0  23:30:36
>q    106   0x4b3fd81  --rw-rw-rw-   gempak   ldm  4705   0  23:30:44
>q    107   0x4b3fdf1  --rw-rw-rw-   gempak   ldm  481    0  23:30:52
>q   1108   0x4b402d8  --rw-rw-rw-   gempak   ldm  6072   0  23:35:20
>q    109   0x4b402f1  --rw-rw-rw-   gempak   ldm  6097   0  23:35:39
>T         ID      KEY        MODE        OWNER    GROUP  CPID  LPID
>ATIME    DTIME    CTIME
>Shared Memory:
>m        202   0          --rw-rw-rw-   gempak      ldm 19174   294
>17:01:19 17:01:19 17:01:19
>T         ID      KEY        MODE        OWNER    GROUP   OTIME    CTIME
>Semaphores:
>twister:[/home/chennon/output/gifs/sat/1998]%
>
>but no gplt processes.  There is a log file from the last time I tried to 
>run the script in:
>
>/usr/local/gempak/logs/upperair.log
>
>I ran it just after a reboot, so the system should have been clean.  One
>other thing that happened after the rebuild that shouldn't have an impact
>but I thought I would mention - we turned off a bunch of system processes
>due to security concerns - the ones that are no longer active are in
>/etc/rc2.d/turnedoff.  I don't see any that would have an impact on gempak
>programs but I thought I would mention it. 
>
>I appreciate your efforts.  Thanks.
>
>Chris
>  
>================================================
>| Chris Hennon        Ohio State University   |
>| Tropical Meteorology      address@hidden   |
>|                                              |
>| Dept of Geography   Office: 1155 Derby Hall  |
>| 1036 Derby Hall     Phone : (614) 292-2704   |
>| Columbus, OH 43210  Fax   : (614) 292-6213   |
>================================================
>
>On Mon, 14 May 2001, Unidata Support wrote:
>
>> 
>> Chris,
>> I'm not saying that you can't run more than 1 GEMPAK program at the same tim
> e.
>> What I can say is:
>> 1) if you have a program that frequently exits abnormally, and leaves behind
>>    a gplt, or other process, then the likelihood is that system resources wi
> ll
>>    start to run short.
>> 
>> 2) If 2 processes ask for a gplt at the same time, it is possible for both p
> rograms
>>    to be issued the same message queue ID by the system. This happens becaus
> e
>>    until the program actually gets the gplt process running, the system will
>  keep
>>    handing out the same available message queue. Using _gf programs where th
> e 
>>    gplt and gf processes are linked to the application reduces the total
>>    number of processes running on your system at any one time, and avoids th
> e
>>    use of message queues- thereby avoiding the possible conflict above.
>> 
>> 3) If multiple programs are running at the same time, you should have ntl ru
> nning
>>    on the display so that all processes use the shared color map so you don'
> t run out of
>>    colors on the display (you can run ntl on a screen:1 as well). Or, use th
> e gif device
>>    driver that doesn't require an X display to be running (you'll have to us
> e message queues
>>    for the gif driver - except with the radmap_sw program which I do have li
> nked with gif insted of gf).
>> 
>> Nothing has changed in the underlying message queue system between 5.4 and 5
> .6, or the
>> shared color system- so that isn't a cause for differences.
>> 
>> when you say that models take 2-3 hours to run, are you saying that the time
>  over which the data
>> arrives is 2-3 hours, or are you saying that the GEMPAK programs take that l
> ong to run?
>> I can help you organize actions to kick off when the LDM receives necessary 
> grids, or
>> determine when all the pieces of data exists so that you don't have to run p
> rograms
>> multiple times to recreate plots as more data arrives. Let me know if I can 
> help you.
>> 
>> Steve Chiswell
>> Unidata User Support
>> 
>> 
>> 
>> 
>> 
>> 
>> >From: address@hidden (Chris Hennon)
>> >Organization: UCAR/Unidata
>> >Keywords: 200105142213.f4EMDfp11331
>> 
>> >Steve -
>> >
>> >This issue seems to have been resolved after a reboot, though I am not
>> >sure why.
>> >
>> >Just to clarify,
>> >are you saying that two or more gempak programs cannot be running at the
>> >same time?  When I was using 5.4 and before the rebuild, I sometimes had 4
>> >or 5 scripts cranking along at the same time with no problem.  I've
>> >followed your suggestions, using the _gf programs where possible and using
>> >master scripts for large jobs.  But there are still issues with
>> >overlapping jobs - for example, surface fields get plotted every hour, but
>> >to run the NGM,ETA, and AVN models takes at least 2-3 hours to run.
>> >
>> >Thanks ahead.
>> >
>> >Chris
>> >
>> >================================================
>> >| Chris Hennon             Ohio State University   |
>> >| Tropical Meteorology      address@hidden   |
>> >|                                              |
>> >| Dept of Geography   Office: 1155 Derby Hall  |
>> >| 1036 Derby Hall     Phone : (614) 292-2704   |
>> >| Columbus, OH 43210  Fax   : (614) 292-6213   |
>> >================================================
>> >
>> >On Mon, 7 May 2001, Unidata Support wrote:
>> >
>> >> 
>> >> Chris,
>> >> 
>> >> I was actually refering to the grid dimensions, can you send me the
>> >> GDINFO for your grid file?
>> >> 
>> >> Steve Chiswell
>> >> Unidata User Support
>> >> 
>> >> 
>> >> 
>> >> >From: address@hidden (Chris Hennon)
>> >> >Organization: UCAR/Unidata
>> >> >Keywords: 200105071933.f47JXqp00391
>> >> 
>> >> >Steve -
>> >> >
>> >> >The upperstr.grd file is pretty big:
>> >> >
>> >> >twister:[/usr/local/gempak/grids]% ls -l
>> >> >-rw-r--r--   1 gempak   ldm      2575360 Apr 12 23:30 upperstr.grd
>> >> >
>> >> >oabsnd is version 5.6.a, as is dcuair.  
>> >> >
>> >> >Chris
>> >> >
>> >> >================================================
>> >> >| Chris Hennon          Ohio State University   |
>> >> >| Tropical Meteorology      address@hidden   |
>> >> >|                                              |
>> >> >| Dept of Geography   Office: 1155 Derby Hall  |
>> >> >| 1036 Derby Hall     Phone : (614) 292-2704   |
>> >> >| Columbus, OH 43210  Fax   : (614) 292-6213   |
>> >> >================================================
>> >> >
>> >> >On Mon, 7 May 2001, Unidata Support wrote:
>> >> >
>> >> >> 
>> >> >> Chris,
>> >> >> 
>> >> >> What is the size of the $HOME/grids/upperstr.grd file?
>> >> >> What version of GEMPAK are you running (eg 5.6, 5.6.C)?
>> >> >> Are you running a different version of the dcuair decoder?
>> >> >> 
>> >> >> For example:
>> >> >>  GEMPAK-OABSND>version
>> >> >> 
>> >> >>  GEMPAK Version 5.6.c.1
>> >> >> 
>> >> >> % dcuair -help
>> >> >> ....
>> >> >> >Version 5.6.c.1<
>> >> >> 
>> >> >> 
>> >> >> Steve Chiswell
>> >> >> Unidata User Support
>> >> >> 
>> >> >> 
>> >> >> >From: address@hidden (Chris Hennon)
>> >> >> >Organization: UCAR/Unidata
>> >> >> >Keywords: 200105071647.f47Gltp15071
>> >> >> 
>> >> >> >Steve -
>> >> >> >
>> >> >> >I double checked and all looks well there:
>> >> >> >
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]% cd $GEMEXE
>> >> >> >twister:[/usr/local/gempak/bin/sol]% ls -l gplt
>> >> >> >-rwxr-xr-x   1 gempak   ldm       496276 Apr 23 13:45 gplt*
>> >> >> >twister:[/usr/local/gempak/bin/sol]% cd ../../scripts/upperair
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]% oabsnd
>> >> >> > SNFILE    Sounding data file                $RAW_UPA/20010507_upa.ge
> m
>> >> >> > GDFILE    Grid file                         $HOME/grids/upperstr.grd
>> >> >> > SNPARM    Sounding parameter list           tmpc
>> >> >> > STNDEX    Stability indices                  
>> >> >> > LEVELS    Vertical levels                   925
>> >> >> > VCOORD    Vertical coordinate type          PRES
>> >> >> > DATTIM    Date/time                         12
>> >> >> > DTAAREA   Data area for OA                   
>> >> >> > GUESS     Guess file*time                    
>> >> >> > GAMMA     Convergence parameter             0.3
>> >> >> > SEARCH    Search radius/Extrapolation       20/EX
>> >> >> > NPASS     Number of passes                  2
>> >> >> > QCNTL     Quality control threshold          
>> >> >> > Parameters requested: SNFILE,GDFILE,SNPARM,STNDEX,LEVELS,VCOORD,DATT
> IM,
>> >> >> > DTAAREA,GUESS,GAMMA,SEARCH,NPASS,QCNTL.
>> >> >> > GEMPAK-OABSND>r
>> >> >> >Could not fork
>> >> >> > [GEMPLT -101]  NOPROC   - Nonexistent executable.
>> >> >> > [OABSND -3]  Fatal error initializing GEMPLT.
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]%
>> >> >> >
>> >> >> >Chris
>> >> >> >
>> >> >> >================================================
>> >> >> >| Chris Hennon               Ohio State University   |
>> >> >> >| Tropical Meteorology      address@hidden   |
>> >> >> >|                                              |
>> >> >> >| Dept of Geography   Office: 1155 Derby Hall  |
>> >> >> >| 1036 Derby Hall     Phone : (614) 292-2704   |
>> >> >> >| Columbus, OH 43210  Fax   : (614) 292-6213   |
>> >> >> >================================================
>> >> >> >
>> >> >> >On Mon, 7 May 2001, Unidata Support wrote:
>> >> >> >
>> >> >> >> 
>> >> >> >> Chris,
>> >> >> >> 
>> >> >> >> OABSFC requires that "gplt" be found. The non-existent
>> >> >> >> executable seems to indicate that $GEMEXE/gplt is either
>> >> >> >> not bring found, that you don't have permission to execute it, 
>> >> >> >> or that for some reason the system is not able to execute gplt.
>> >> >> >> 
>> >> >> >> Since it says non-existent, it sounds like the program is
>> >> >> >> not being found. See if there is any problem with your $GEMEXE
>> >> >> >> environmental variable (which is set when you sourced Gemenviron),
>> >> >> >> and double check that gplt is executable as well.
>> >> >> >> 
>> >> >> >> The attempt to execute gplt occurs when you run the analysis,
>> >> >> >> eg, not when you first start up oabxxx.
>> >> >> >> 
>> >> >> >> Steve Chiswell
>> >> >> >> Unidata User Support
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> >From: address@hidden (Chris Hennon)
>> >> >> >> >Organization: UCAR/Unidata
>> >> >> >> >Keywords: 200105071618.f47GIbp13844
>> >> >> >> 
>> >> >> >> >Steve -
>> >> >> >> >
>> >> >> >> >I've run into a curious problem.  I'm trying to run "oabsnd" for j
> ust
>> >  on
>> >> > e
>> >> >> >> >level and one variable and the program exits with a NOPROC - Nonex
> ist
>> > ent
>> >> >> >> >executable and "Could not fork" errors.  I think I have plenty of 
> swa
>> > p
>> >> >> >> >space:
>> >> >> >> >
>> >> >> >> >swap -s
>> >> >> >> >total: 67792k bytes allocated + 167728k reserved = 235520k used, 1
> 596
>> > 08k
>> >> >> >> >available
>> >> >> >> >
>> >> >> >> >There are no rogue processes around that I can see.  There are no 
> dea
>> > d
>> >> >> >> >message queues.  In the past, I have run oabsnd under the same con
> dit
>> > ion
>> >> > s
>> >> >> >> >without a problem, even with more levels and more variables.  The 
> sup
>> > por
>> >> > t
>> >> >> >> >archives all seem to indicate a problem with either swap space or 
> orp
>> > han
>> >> > ed
>> >> >> >> >processes but it doesn't appear that I have those issues.  Any ide
> as?
>> >> >> >> >Thanks.
>> >> >> >> >
>> >> >> >> >Chris    
>> >> >> >> >
>> >> >> >> >================================================
>> >> >> >> >| Chris Hennon            Ohio State University   |
>> >> >> >> >| Tropical Meteorology      address@hidden   |
>> >> >> >> >|                                              |
>> >> >> >> >| Dept of Geography   Office: 1155 Derby Hall  |
>> >> >> >> >| 1036 Derby Hall     Phone : (614) 292-2704   |
>> >> >> >> >| Columbus, OH 43210  Fax   : (614) 292-6213   |
>> >> >> >> >================================================
>> >> >> >> >
>> >> >> >> 
>> >> >> >> *******************************************************************
> ***
>> > ***
>> >> > ***
>> >> >> >> Unidata User Support                                    UCAR Unidat
> a P
>> > rog
>> >> > ram
>> >> >> >> (303)497-8644                                                  P.O.
>  Bo
>> > x 3
>> >> > 000
>> >> >> >> address@hidden                                   Boulder,
>  CO
>> >  80
>> >> > 307
>> >> >> >> -------------------------------------------------------------------
> ---
>> > ---
>> >> > ---
>> >> >> >> Unidata WWW Service                        http://www.unidata.ucar.
> edu
>> > /  
>> >> >    
>> >> >> >> *******************************************************************
> ***
>> > ***
>> >> > ***
>> >> >> >> 
>> >> >> >
>> >> >> 
>> >> >> **********************************************************************
> ***
>> > ***
>> >> >> Unidata User Support                                    UCAR Unidata P
> rog
>> > ram
>> >> >> (303)497-8644                                                  P.O. Bo
> x 3
>> > 000
>> >> >> address@hidden                                   Boulder, CO
>  80
>> > 307
>> >> >> ----------------------------------------------------------------------
> ---
>> > ---
>> >> >> Unidata WWW Service                        http://www.unidata.ucar.edu
> /  
>> >    
>> >> >> **********************************************************************
> ***
>> > ***
>> >> >> 
>> >> >
>> >> 
>> >> *************************************************************************
> ***
>> >> Unidata User Support                                    UCAR Unidata Prog
> ram
>> >> (303)497-8644                                                  P.O. Box 3
> 000
>> >> address@hidden                                   Boulder, CO 80
> 307
>> >> -------------------------------------------------------------------------
> ---
>> >> Unidata WWW Service                        http://www.unidata.ucar.edu/  
>    
>> >> *************************************************************************
> ***
>> >> 
>> >
>> 
>> ****************************************************************************
>> Unidata User Support                                    UCAR Unidata Program
>> (303)497-8644                                                  P.O. Box 3000
>> address@hidden                                   Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service                        http://www.unidata.ucar.edu/     
>> ****************************************************************************
>> 
>