[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20000414: file system full on Lyndon Debian Linux (cont.)



>From: Mark Tucker <address@hidden>
>Organization: Lyndon State College
>Keywords: 200004051409.IAA17096 McIDAS-XCD LDM startxcd xcd_run

re: disk filling at Lyndon State
>We've been having this disk space issue recur frequently over the past
>week or two now.  This has been coupled with several other problems with
>our ldm's McIdas processing.  
>
>Last week, our disk space problem was caused by a problem with
>redirection.

If the problem has only been occurring for the past couple of weeks or
so, then it might be related to changes made in the model output content
of the HRS stream in the IDD.  I cut an addendum last night that had
several changes to the McIDAS-XCD grid decoding that you may need to
grab and install.

>For some reason (I still do not know why) the ldm/mcidas
>configuration lost all of it's redirections and ADDE definitions.

McIDAS-XCD gets its McIDAS configuration information through environment
variable settings in the script xcd_run.  xcd_run is run:

o at LDM startup through an 'exec "xcd_run MONITOR"' in ldmd.conf

o when textual data is sent to XCD processes through a pqact.conf entry:

  DDPLUS|IDS      ^.*     PIPE
        xcd_run DDS

o when binary data is sent to XCD processes through a pqact.conf entry:

  HRS     ^.*     PIPE
        xcd_run HRS

xcd_run must be located in a directory in the the user that is running
the LDM's PATH.  The environment variables that are defined in it
that are needed for McIDAS are:

MCHOME
MCDATA
MCLOG
MCPATH
PATH
LD_LIBRARY_PATH

REDIRECTions used by McIDAS should be found in the $MCDATA directory
in a file named LWPATH.NAM.  This file should be readable and writable
by the user mcidas and the user running the LDM.  However, there is
nothing in the Unidata setup that would cause REDIRECTions to be written,
so LWPATH.NAM should stay owned by mcidas.

>The ldm
>user had ownership of many files in the ~mcidas/data and ~mcidas/workdata
>directories, including LWPATH.NAM and RESOLV.SRV.

The only way that this should happen is if someone logged on as ldm
and then ran REDIRECT and DSSERVE commands.  There is nothing in
the way I recommend things to be setup that would cause this.  Are
there local Lyndon processes that manipuate REDIRECTions and DSSERVEs?

>I am generally very
>careful about making changes to our ldm and have not made any recent
>changes that would have caused this - at least not as far as I can tell.
>I think this is beyond the scope of a misconfigured ldm but I thought I'd
>ask in case you knew of some way that this could have happened.

I would have to take a close look at everything that is running as ldm
and mcidas on your system to fully understand what could be going on.

>The other occurences of full disks occurs when the dmsyn.k process goes
>awry.

Interesting.  I have never had this problem.

>This fills our /usr directory while dmsyn.k perpetually sits at the
>top of "top".

Since I don't know which file systems are used for what on your system,
a telling me that /usr is filling up means nothing to me.

>I have not been able to find where
>the increased disk usage is occuring even after logging the output from
>"du" onto another partition as root.  This seems to happen between 21 and
>23:00 EDT.

So, there is one or more phantom files that are opened somehow that
do not show up in a 'du' listing?

re: size of XCD_START.LOG
>Our XCD_START.LOG is actually quite small.

OK, this is how it should be.

>I have not yet looked at it
>when the disk is full but I will next time we have a problem.

OK.

>Could the
>du command not be reading this file if it is opened for writing? 

Only if you have a process that is trying to scour it while it is
opened.

>Another problem we have been having is with our schema, particularly with  
>MDXX* files.  Our SFCHOURLY seems to be messed up as a result of this:
>
>
>PTLIST RTPTSRC/SFCHOURLY FORM=FILE ALL                                 
>Pos      Description                        Schema  NRows NCols  Date  
>------   --------------------------------   ------  ----- ----- -------
>     3   SAO/METAR data for   12 APR 2000   ISFC       72  4500 2000103
>     4                                              ***** ***** *******
>     5                                              ***** ***** *******
>PTLIST: Done     
>
>Also, in our log files there are many entries about the NLDN schema not
>being registered:
>
>Apr 13 21:12:49 cirrus nldn2md[20269]: Schema NLDN not registered 
>Apr 13 21:12:49 cirrus nldn2md[20269]: ERROR creating MD file: 74 
>Apr 13 21:12:49 cirrus pqact[10239]: pbuf_flush (9) write: Broken pipe 
>Apr 13 21:12:50 cirrus nids2area[20271]: NIDS2AREA -- BEGIN 
>Apr 13 21:12:50 cirrus nids2area[20271]: NIDS2AREA -- DONE AREA 9017 
>Apr 13 21:12:51 cirrus nids2area[20272]: NIDS2AREA -- BEGIN 
>Apr 13 21:12:51 cirrus nids2area[20272]: PRODUCT CODE=RG          104
>211000 
>Apr 13 21:12:51 cirrus nids2area[20272]: NIDS2AREA -- DONE AREA 933 
>...

This is telling us that the copy of SCHEMA that should be present in the
directory in which the XCD MDXX and ldm-mcidas MDXX files are created
is either missing or is zero length.  Do you have LDM scouring setup
for that/those directories?  If so, the LDM scouring probably deleted
that copy of SCHEMA.  This might well cause the problems you are seeing
with ISFC and NLDN files.

By the way, if SCHEMA is missing/bad in the directory, you will need
to:

o stop the LDM

o install a new, correct SCHEMA in the directory

o delete MD files that are bad as they won't get repaired when new data
  comes in

o clean up as much as possible

o restart the LDM

>I have checked and the proper schema are registered.
>
>cirrus:~/data> lsche.k ALL ALL |grep NAME
>NAME: ASTA  VERSION:  1  DATE: 1982215  TEXTID: "AREA STATISTICS RESULTS
>NAME: BPRO  VERSION:  1  DATE: 1991071  TEXTID: "WIND PROFILER BEAM DATA
>NAME: FO14  VERSION:  2  DATE: 1998231  TEXTID: "NGM MOS FORECASTS
>NAME: GRET  VERSION:  1  DATE: 1998285  TEXTID: "GOES I/M QUANTITATIVE RETRIEV
> AL
>NAME: IHGT  VERSION:  2  DATE: 1998055  TEXTID: "UPPER AIR DATA HEIGHT INTERP.
>NAME: IRAB  VERSION:  4  DATE: 1998231 TEXTID: "INTL. RADIOSONDE OBS (UPPER AI
> R
>NAME: IRSG  VERSION:  2  DATE: 1998231  TEXTID: "INTL. RADIOSONDE OBS--SIG LEV
> EL
>NAME: ISEN  VERSION:  3  DATE: 1998258  TEXTID: "ISENTROPIC SURFACE DATA
>NAME: ISFC  VERSION:  7  DATE: 1999085  TEXTID: "SURFACE HOURLY OBSERVATIONS
>NAME: ISHP  VERSION:  4  DATE: 1998230  TEXTID: "SHIP/BUOY/C-MAN OBSERVATIONS
>NAME: NLDN  VERSION:  2  DATE: 1998337  TEXTID: "NLDN DATA OBS UNIDATA FORMAT
>NAME: PIRP  VERSION:  3  DATE: 1998233  TEXTID: "PIREP/AIREP/ACARS DATA
>NAME: RAOB  VERSION:  3  DATE: 1989328  TEXTID: "RADIOSONDE OBS (UPPER AIR DAT
> A)
>NAME: RSIG  VERSION:  3  DATE: 1983018  TEXTID: "RADIOSONDE OBS--SIG LEVELS
>NAME: RAWI  VERSION:  1  DATE: 1990213  TEXTID: "RAWINSONDE OBS
>NAME: SVCA  VERSION:  5  DATE: 1982158  TEXTID: "SURFACE HOURLY OBSERVATIONS
>NAME: SYN   VERSION:  5  DATE: 1999085  TEXTID: "SURFACE SYNOPTICS
>NAME: WPR6  VERSION:  2  DATE: 1998338  TEXTID: "6-MIN WIND PROFILER UPPER AIR
>  D
>NAME: WPRO  VERSION:  3  DATE: 1998338  TEXTID: "WIND PROFILER UPPER AIR DATA
>NAME: IHYD  VERSION:  1  DATE: 1995094  TEXTID: "HYDROLOGIC DAILY OBSERVATIONS
>cirrus:~/data> 

Where is the copy of SCHEMA that is being checked by this lsche.k invocation?
You can find this out by:

<logging in as 'mcidas'>
cd workdata
dmap.k SCHEMA

It could be that you are doing just this and looking at a copy of SCHEMA
that is in the ~mcidas/data directory.  If so, then copy it to the directory
where the MD files are to be made.

>After the redirections and adde definitions were lost I created
>a new schema file using SCHEMA.BAT in an attempt to fix these problems
>but they persist.

If SCHEMA is missing/bad, XCD processes must be stopped while it is
recreated.  After that the LDM can be restarted.

>Feel free to login on Cirrus and look around if necessary.

I don't have the login information for ldm and mcidas.  If I did, I would
login to take a look.

Tom