[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040702: potential LDM/pqact problem on OSF/1



David,

>Date: Thu, 1 Jul 2004 12:15:18 -0700
>From: David Ovens <address@hidden>
>Organization: University of Washington
>To: Steve Emmerson <address@hidden>
>Subject: Re: 20040630: potential LDM/pqact problem on OSF/1
>Keywords: 200406241954.i5OJsCWb010248 LDM PIPE Perl

The above message contained the following:

> I made both of the changes you suggested about changing the
> pqact_glacier.conf file to send more selective files to zlib2gif.pl
> and to write STDIN to /dev/null for those products which still get
> through the filter and yet are not wanted.  
> 
> We are still seeing the random truncated PERL-produced files, however.
> 
> -rw-r--r--   1 ldm      ldm      1148165 Jul  1 08:14 n0r_20040701_1506
> -rw-r--r--   1 ldm      ldm       262144 Jul  1 08:14 n0r_20040701_1506.1
> -rw-r--r--   1 ldm      ldm      1148165 Jul  1 08:14 n0r_20040701_1506.2
> 
> We did go 24-hours without seeing the Bourne-shell produced truncated
> files.... 

We believe we've discovered the cause of your problem and we have a
solution.

The files are being truncated because your perl-script decoder is
interacting with an aspect of OSF/1 that doesn't conform to the UNIX
standard.  The pqact(1) program writes data to a UNIX pipe and also
receives SIGCHLD signals from decoders that have terminated.  In your
case, while the pqact(1) process is writing data to a perl decoder, the
write is interrupted by a SIGCHLD from the previous decoder.  Normally,
the write to the pipe would be transparently restarted (the pqact(1)
program has code in it to tell the operating system to do this) but
under OSF/1, such restarts do not occur -- in violation of the UNIX
standard.

Information on this aspect of the UNIX standard can be found at

    http://www.opengroup.org/onlinepubs/007908799/xsh/sigaction.html

Search for "SA_RESTART".

The solution is to modify the pqact(1) program to work around this
deficiency in OSF/1.  You will find a modified pqact(1) program and
sources in directory /usr/local/ldm/pqact on Glacier.

Due to unusual file protection modes on Glacier, we cannot install the
new pqact(1) program where it should go.  If, however, you copy the file

    /usr/local/ldm/pqact/pqact

to the home directory of the LDM, then that one should be used rather
than the old one when the LDM system is restarted.  Try it and see.  If
it doesn't work, then stop the LDM system, delete the new pqact(1) program,
and restart the LDM system to go back to the way things are currently
running.

Please keep us apprised.

Regards,
Steve Emmerson