[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WDB-706909]: Restarting LDM



Hi Martha,

re:
> I think part of the problem is due to the THREDDS server that we are
> running on NOAAPXCD.  When I restarted it I reclaimed a lot of disk space.
> I did notice that there were a lot of files marked "deleted" when I ran the
> UNIX command "lsof" on our data file system.

Interesting...

> The lsof command shows files that are opened by the OS and what process
> has the files open; "java" which is attached to THREDDS had a lot of files
> in our /data/pub area opened, many of which had been deleted by a regularly
> schedule cron job cleanup.

Ah Ha!  The smoking gun!

> I don't know how often THREDDS refreshes itself;
> certainly IDV instructs it to refresh but how it refreshes locally is
> something I'll have to research.

I am turning the THREDDS-related portion of this inquiry into a new
netcdf-java inquiry.  Someone from the THREDDS group will get back to
you with specific questions/observations regarding your scouring of
files being used by THREDDS.

Question:

- did you generate (or can you generate) a list of the files that were
  still open even though they had supposedly been scoured?

> Re the GRID files, I had not noticed that the number attached to each
> was 5-digit rather than 4-digit, and the mcscour was missing a decimal place
> so I added GRI* to scour.conf. I have since removed GRI* from the latter and
> am letting mcscour handle GRI*.

Very good.  These files should not have had anything to do with the
phantom disk usage you were seeing _unless_ you were somehow trying
to delete files that were being actively written to.

> Otherwise, this is our setup for file cleanup:
> 
> mcscour.sh     - MD, GRID, TXT

Very good.

> ldmadmin scour - /data/pub (the area where the files are written
>                    for THREDDS)

This may be problematic, but we will learn more via your interactions
with THREDDS developers.

>                  /data/nexrad  (we aren't using UNIDATA but SSEC
>                    decoding, and I couldn't find how SSEC recommended
>                    cleaning up nexrad)

If you are using the SSEC approach, then the scouring occurs immediately
after a new file is written.  The number of files to keep is specified
in the NEXRID.CFG configuration file in the KEEP= value; this file should
be located in your $MCDATA directory of the user 'mcidas'.

> xcdadmin       - BUFR and GRIB

Very good.

> I also noted that in our pqact file for THREDDS, we have not "close"
> directive with FILE, just "flush".  Do you recommend changing that; I thought
> that you had set up "flush" rather than "close" for some reason when you were
> helping us with CONDUIT.

The reasoning behind not using the '-close' flag especially for CONDUIT GRIB
filing is that there are so many products, that the overhead in doing a
re-open of an existing file (open and then seek) for each newly received GRIB
message could be very high.

> Thanks for your help.

Question:

- are you really running on the ragged edge of disk space, or are you just
  being careful with disk resources?

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: WDB-706909
Department: Support LDM
Priority: Normal
Status: Open