NOTE: The decoders
mailing list is no longer active. The list archives are made available for historical reasons.
============================================================================== Robb Kambic Unidata Program Center Software Engineer III Univ. Corp for Atmospheric Research rkambic@xxxxxxxxxxxxxxxx WWW: http://www.unidata.ucar.edu/ ============================================================================== ---------- Forwarded message ---------- Date: Mon, 07 Feb 2000 14:45:17 -0700 From: Russ Rew <russ@xxxxxxxxxxxxxxxx> To: Ken Waters <Ken.Waters@xxxxxxxx> rkambic@xxxxxxxxxxxxxxxx, lmiller@xxxxxxxxxxxxxxxxSubject: Re: [awipsldm] Re: LDM Observations and Comments
To: russ@xxxxxxxxxxxxxxxx From: Ken Waters <Ken.Waters@xxxxxxxx> Subject: Re: 20000207: [awipsldm] Re: LDM Observations and Comments Organization: NWS Southern Region Keywords: PIPE action, decoder processes
Hi Ken,
If you don't mind, I'd like to pursue the matter of my Perl script that kicks off other system calls. I have attached my script for your reference.Basically, I write out the stdin to a temp file because there are aseries of actions that can be done to a file. Maybe there is a better way. I know you suggested not writing temp files, but either way I'm forced to use system calls to write the file out, right?
If you want to do a series of actions to a product, the usual way to handle this is have multiple pqact.conf pattern-action entries match the product, with each specifying the same pattern but a different action. But you must already know this, because you are already doing 4 things with each product, according to your pqact.conf entries: # Test script for ALL products AFOS ^(...)(...)(...|..) PIPE -strip /home/ldm/process \1 \2 \3 # Rotate all versions AFOS ^(...)(...)(...|..) PIPE -strip /home/ldm/version.csh \1 \2 \3 AFOS ^(...)(...)(...|..) FILE -strip -overwrite /home/ldm/data/\1/\2/\1\2\3.1.txt # Append products AFOS ^(...)(...)(...|..) FILE -strip /home/ldm/data/\1/\2/\1\2\3.txt The above is generating a *lot* of processes, two for each AFOS product for starters ("process" and "version.csh", with these generating even more processes as explained below. The LDM is designed to permit you to start up a process once and keep it running to handle multiple products, instead of starting a process for each product. That's the way our perl decoders work, and maybe you could use the same pattern to have your perl scripts each handle multiple products. There's a brief description of this in the Site Manager's Guide: The PIPE command permits execution of an arbitrary process (an executable program or a shell script, for example) with a data product as standard input to the process. The program should read from standard input and exit upon reading end of file. It shouldalso time out if no input is read for some time.
Like files, pipelines to child processes are cached for use in processing subsequent products. The pipeline will remain open until the LDM server needs to recycle the output descriptor or until the child process exits of its own accord. The -close option forces the pipe to close after one product has been received. The LDM pqact program maintains a list of 32 (currently) open file descriptors corresponding to files it is writing data to and pipes to which it is writing data to be read by running "decoder" processes. When a new product comes along, if pqact needs to invoke a "PIPE" action on it, pqact checks its list of open file descriptors to see if the invocation of the program and arguments are the same as for any of the open pipes on its list. If so, it just writes the data down that pipe and it doesn't have to start up a new process. In order to work with this model, processes that will handle multiple products need to be able to detect the delimiters that separate products, parse the product header if necessary, and block when there's no input until more input is available on the pipe they're reading from. The only time such a process would exit would be if it detected that the pipe had been closed on the other end (by pqact, because it needed another descriptor and this one hadn't been used for the longest time of any of the open descriptors). So pqact typically starts up a perl decoder for upper-air products, for example, and that one decoder keeps running, decoding every upper-air report that it reads from its stdin connected to the pipe from pqact. If we started up a new instance of a decoder for every product, it would probably bog down the server, though it still would be feasible on modern workstations if there weren't too many products. But you are starting up a couple of perl scripts for every product, and the first of these, "process", is starting up lots of other processes using the perl "system" function to mv and cp files. I think it would be OK if you were only invoking 1 or 2 processes per product, but it looks like you're starting up more like a dozen. In order to use the same process for multiple products, you have to make sure you don't use unique arguments for each process invocation, but instead let the process parse some of the information that is unique to each product. That way, pqact will see the same process and the same argument string, and know to use the existing running process without starting up a new one. For example, our upper air decoder only gets the year and month as arguments from the product header that it uses to invoke the decoder, so theoretically, that one decoder could stay running for a month: # upper air perl decoder DDS|IDS ^U[ABDEFGHIJKLMPQRSTXZ].... .... ([0-3][0-9]) PIPE /usr/local/ldm/decoders/ua2nc etc/ua.cdl data/decoded (\1:yy)(\1:mm)
Anyway, what's going on is that my process script is getting hung up on all the "system("cat > $temp/$filenm.tmp");" lines. At any one time, I find about 10-15 open jobs to the process script and each one has another job which is writing out this temp file. What is puzzling to me is that the other uses of "cat" later in the script don't seem to be a problem. I think what's going on is the various different instances of the script are 'falling all over each other' trying to read from stdin. Some of these jobs are taking up to 5 minutes to run! This is, in some cases, leading to errors in the proper storage of the data files.
Since pqact is starting up a new instance of the "process" script for each product, and not reusing any of these, and it only keeps 32 file descriptors open, each product results in a new file descriptor and closing the least-recently used file descriptor, so that process sees its pipe closed on the other end. I don't know how your perl script reacts to its stdin being closed, but it may be causing some of the problems you are seeing. Also, since you are doing most of the work in the perl scripts in "system" calls, lots of extra processes are getting started. If there's some way you can limit the number of "system" calls from the perl script and do some of these another way, it would probably help cut down on all the CPU overhead involved in creating and destroying processes.
It's important that I get this worked out as our data requirements are increasing and I want to ensure this data feed works as well as possible.For your reference, I have enclosed a copy of (1) my pqact.conf, (2)the process script, and (3) an sample of jobs running [ps -ef].
Thanks for sending such a complete description of the problem. I hope this helps explain what is going on. --Russ
decoders
archives: