[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020513: scouring McIDAS data files (cont.)



>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200205130500.g4D50Ta13471 McIDAS mcscour.sh

Gilbert,

re: scouring setup

>You already had this set up to do this.

OK, and it was working.

re: typical location for mcscour.sh

>Yep. It's not working. It returns with a command prompt immediately with 
>no errors!

This indicates a lack of the system resource needed to run the McIDAS
commands.

>> <login as 'mcidas'>
>> cd workdata
>> decinfo.k SET DMGRID INACTIVE
>
>I get an error message that says "decinfo.k: Cannot create negative UC

This confirms the lack of the system resource needed to run a McIDAS
program.

>No dice, Tom. Any other hints? Disk space still zero, and I need my 
>machine back!

Given the failure to create negative UC message, I decided to login
to see what was happening.  What I found was that your system had
run out of interprocess communication resources.  This was caused by
_lots_ of shared memory segments ( > 150) being allocated to and owned
by 'ldm'.  There were also a few ( < 10) allocated to and owned by
'mcidas'.  Given that the system had no resources to run new McIDAS
programs, it is no wonder that disk scouring stopped working.

The fix for all of this was:

o stop the LDM

o remove all shared memory segments allocated by 'ldm' for McIDAS
  related activities (XCD decoding)

o remove all of the temporary directories used by 'ldm' when
  McIDAS programs are executed:

  cd ~ldm/.mctmp
  rm -rf *

o remove all of the GRID files created by XCD:

  cd /data/mcidas
  rm -f GRID*

  (lots of these files had zero lengths)

o check on available disk space (32 GB after the above cleanup)

o restart the LDM

As I write this, McIDAS-XCD is once again decoding data, and there is plenty
of disk space:

weather2-niu ldm-48> df -k
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda2             38314668   4188776  32179620  12% /
/dev/hda1                46636      8727     35501  20% /boot

Why your system initially went into a tailspin I cannot say.  I can say
that after the resources needed to run McIDAS programs were exhausted,
it was pretty much inevitable that you would run out of disk space.
The reason is that the McIDAS-XCD GRID decoder never exits, so it will
keep running.  The LDM processes (rpc.ldmd) also never exit (they
shouldn't, that is), so GRID data files keep getting written and stop
getting scoured.

When things are running correctly, you should see a small number
of subdirectories of ~ldm/.mctmp, and a small number of IPCs allocated
to 'ldm' for McIDAS activities.  Here is how things look at this
moment:

weather2-niu ldm-52> ls -alt ~ldm/.mctmp
ls: unparsable value for LS_COLORS environment variable
total 20
drwx------    5 ldm      users        4096 May 13 12:02 ./
drwx------    2 ldm      users        4096 May 13 11:49 50888713/
drwx------    2 ldm      users        4096 May 13 11:49 50954251/
drwx------    2 ldm      users        4096 May 13 11:49 50987020/
drwxr-xr-x   24 ldm      users        4096 May 13 11:49 ../

weather2-niu ldm-51> ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 294914     ldm       777        196608     2          dest         
0x00000000 983043     ldm       777        196608     2          dest         
0x00000000 1048580    ldm       777        196608     2          dest         
0x00000000 1179653    ldm       777        196608     2          dest         
0x00000000 1245190    ldm       777        196608     2          dest         
0x00000000 3244039    ldm       777        196608     2          dest         
0x00000000 3309576    ldm       777        196608     2          dest         
0x00000000 50888713   ldm       600        384300     7                       
0x00000000 50921482   ldm       600        512000     0                       
0x00000000 50954251   ldm       600        384300     2                       
0x00000000 50987020   ldm       600        384300     2                       
0x00000000 51019789   ldm       600        512000     0                       
0x00000000 51052558   ldm       600        512000     0                       

The number of subdirectories of ~ldm/.mctmp will grow and shrink as
McIDAS PostProcess BATCH files are run upon receipt of Unidata-Wisconsin
(LDM feedtype MCIDAS) imagery.  The same is true for IPCs.

Given that the problem recently occurred, I recommend that you
occasionally do a quick check of IPCs use by 'ldm' (run the same
commands as I included above).  If you see lots of shared memory
segments (under the 'shmid' column above), or if you see lots of
subdirectories under ~ldm/.mctmp, run through the cleanup procedure
that I illustarted above to clean things up.

>From address@hidden Mon May 13 09:26:57 2002
>Subject: Re: 20020513: scouring McIDAS data files

>Grid 5002 is 1.5 GB in size. It appears as though it is scouring,

It wasn't.

>BUT something is not right. I don't know what it is.

weather2 ran out of IPCs.

Tom

>From address@hidden Mon May 13 11:14:08 2002
>Subject: Re: 20020513: scouring McIDAS data files (cont.)

Hi Tom,

re: you now have disk space aplenty
> weather2-niu ldm-48> df -k
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/hda2             38314668   4188776  32179620  12% /
> /dev/hda1                46636      8727     35501  20% /boot

Excellent.
 
re: why your system went into a tailspin

>It is weird. I checked weather.admin; the same tailspin didn't occur 
>there. Then again, I do have grid processing turned off.
 
re: The number of subdirectories of ~ldm/.mctmp will grow and shrink ...

>OK.
 
re: recommend keeping an eye on things

>Gotcha. Thanks much again for the help, and now I understand! Never had 
>this problem before, and I didn't have a clue why it was filling up.

>Take care, I'll keep an eye on this!

*******************************************************************************
Gilbert Sebenste                                                     ********
Internet: address@hidden    (My opinions only!)                     ******
Staff Meteorologist, Northern Illinois University                      ****
E-mail: address@hidden                                 ***
web: http://weather.admin.niu.edu                                      **
Work phone: 815-753-5492                                                *
*******************************************************************************