[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [awipsldm] Re: LDM Observations and Comments (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Mon, 07 Feb 2000 14:45:17 -0700
From: Russ Rew <address@hidden>
To: Ken Waters <address@hidden>
     address@hidden, address@hidden
Subject: Re: [awipsldm] Re: LDM Observations and Comments 

>To: address@hidden
>From: Ken Waters <address@hidden>
>Subject: Re: 20000207: [awipsldm] Re: LDM Observations and Comments
>Organization: NWS Southern Region
>Keywords: PIPE action, decoder processes

Hi Ken,

> If you don't mind, I'd like to pursue the matter of my Perl script
> that kicks off other system calls.  I have attached my script for
> your reference.
>      
> Basically, I write out the stdin to a temp file because there are a
> series of actions that can be done to a file.  Maybe there is a
> better way.  I know you suggested not writing temp files, but either
> way I'm forced to use system calls to write the file out, right?

If you want to do a series of actions to a product, the usual way to
handle this is have multiple pqact.conf pattern-action entries match
the product, with each specifying the same pattern but a different
action.  But you must already know this, because you are already doing
4 things with each product, according to your pqact.conf entries:

  # Test script for ALL products
  AFOS  ^(...)(...)(...|..)
          PIPE  -strip /home/ldm/process \1 \2 \3

  # Rotate all versions
  AFOS  ^(...)(...)(...|..)
          PIPE  -strip /home/ldm/version.csh \1 \2 \3

  AFOS  ^(...)(...)(...|..)
          FILE  -strip -overwrite /home/ldm/data/\1/\2/\1\2\3.1.txt

  # Append products
  AFOS  ^(...)(...)(...|..)
          FILE  -strip /home/ldm/data/\1/\2/\1\2\3.txt

The above is generating a *lot* of processes, two for each AFOS
product for starters ("process" and "version.csh", with these
generating even more processes as explained below.  The LDM is
designed to permit you to start up a process once and keep it running
to handle multiple products, instead of starting a process for each
product.  That's the way our perl decoders work, and maybe you could
use the same pattern to have your perl scripts each handle multiple
products.

There's a brief description of this in the Site Manager's Guide:

    The PIPE command permits execution of an arbitrary process (an
    executable program or a shell script, for example) with a data
    product as standard input to the process. The program should read
    from standard input and exit upon reading end of file. It should
    also time out if no input is read for some time. 

    Like files, pipelines to child processes are cached for use in
    processing subsequent products. The pipeline will remain open until
    the LDM server needs to recycle the output descriptor or until the
    child process exits of its own accord. The -close option forces the
    pipe to close after one product has been received.

The LDM pqact program maintains a list of 32 (currently) open file
descriptors corresponding to files it is writing data to and pipes to
which it is writing data to be read by running "decoder" processes.
When a new product comes along, if pqact needs to invoke a "PIPE"
action on it, pqact checks its list of open file descriptors to see if
the invocation of the program and arguments are the same as for any of
the open pipes on its list.  If so, it just writes the data down that
pipe and it doesn't have to start up a new process.

In order to work with this model, processes that will handle multiple
products need to be able to detect the delimiters that separate
products, parse the product header if necessary, and block when
there's no input until more input is available on the pipe they're
reading from.  The only time such a process would exit would be if it
detected that the pipe had been closed on the other end (by pqact,
because it needed another descriptor and this one hadn't been used for
the longest time of any of the open descriptors).

So pqact typically starts up a perl decoder for upper-air products,
for example, and that one decoder keeps running, decoding every
upper-air report that it reads from its stdin connected to the pipe
from pqact.  If we started up a new instance of a decoder for every
product, it would probably bog down the server, though it still would
be feasible on modern workstations if there weren't too many products.

But you are starting up a couple of perl scripts for every product,
and the first of these, "process", is starting up lots of other
processes using the perl "system" function to mv and cp files.  I
think it would be OK if you were only invoking 1 or 2 processes per
product, but it looks like you're starting up more like a dozen.

In order to use the same process for multiple products, you have to
make sure you don't use unique arguments for each process invocation,
but instead let the process parse some of the information that is
unique to each product.  That way, pqact will see the same process and
the same argument string, and know to use the existing running process
without starting up a new one.  For example, our upper air decoder
only gets the year and month as arguments from the product header that
it uses to invoke the decoder, so theoretically, that one decoder
could stay running for a month:

    # upper air perl decoder
    DDS|IDS     ^U[ABDEFGHIJKLMPQRSTXZ].... .... ([0-3][0-9])
            PIPE        /usr/local/ldm/decoders/ua2nc
                    etc/ua.cdl
                    data/decoded
                    (\1:yy)(\1:mm)

> Anyway, what's going on is that my process script is getting hung up
> on all the "system("cat > $temp/$filenm.tmp");" lines.  At any one
> time, I find about 10-15 open jobs to the process script and each
> one has another job which is writing out this temp file.  What is
> puzzling to me is that the other uses of "cat" later in the script
> don't seem to be a problem.  I think what's going on is the various
> different instances of the script are 'falling all over each other'
> trying to read from stdin.  Some of these jobs are taking up to 5
> minutes to run!  This is, in some cases, leading to errors in the
> proper storage of the data files.

Since pqact is starting up a new instance of the "process" script for
each product, and not reusing any of these, and it only keeps 32 file
descriptors open, each product results in a new file descriptor and
closing the least-recently used file descriptor, so that process sees
its pipe closed on the other end.  I don't know how your perl script
reacts to its stdin being closed, but it may be causing some of the
problems you are seeing.

Also, since you are doing most of the work in the perl scripts in
"system" calls, lots of extra processes are getting started.  If
there's some way you can limit the number of "system" calls from the
perl script and do some of these another way, it would probably help
cut down on all the CPU overhead involved in creating and destroying
processes.

> It's important that I get this worked out as our data requirements
> are increasing and I want to ensure this data feed works as well as
> possible.
>      
> For your reference, I have enclosed a copy of (1) my pqact.conf, (2)
> the process script, and (3) an sample of jobs running [ps -ef].

Thanks for sending such a complete description of the problem.  I
hope this helps explain what is going on.

--Russ