[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #JGZ-326819]: LDM and RAID



Hi Angel,

re: scouring deep hierarchical directories under Linux
> I'm re-visiting this problem... An IDE disk was added to this system
> just to hold the LDM data and it is using reiserfs. This machine is used
> for teaching undergraduates meteorology and they wish to keep over a
> month worth of data. The pqact is the Gempak one so it creates at ton of
> little files nested very deeply. Again, scour seems to take over 24
> hours and I'm about to let scour only run once a week.

I am very surprised that the scouring the GEMPAK tree takes over
24 hours; this is not our experience.  I would guess that this might be
related to your use of Resier FS.  My testing with various file systems
under Linux showed that EXT3 was the fastest for journaled file systems and
EXT2 was the fastest overall.  Given our view that data directories are
typically expendable, I would recommend switching the FS back to EXT2.

> I looked at the scour script and I can't imagine how it could be speeded
> up.

I created a Tcl-based scour script to see if it could be made faster
than the C-shell scripts that Steve Chiswell wrote.  I found that
these scripts were faster by a little bit, but not anything dramatic.
The LDM utility 'scour' might be more efficient if it was recast in
a different language that stopped using 'find'.  I think that the biggest
inefficiency in 'scour' is 'find'.

> Access to the gempak tree using du or find or recursive rm's
> literally takes forever. How are other sites dealing with such deep
> directory structures?

Sites running on OSes other than Linux are not reporting the kinds of
problems that sites running under Linux.  Sites using Linux have had
some success in quicker scouring when the RAID being written to is
a true SCSI-based system.  Again, my experience in building RAIDs using
RAID interface cards is not good.  My comments are based on lots of
experimentation done here at the UPC and in working with community
members (TAMU, Universidad de Costa Rica, Caribbean Institute for
Meteorology and Hydrology).  The TAMU machine was most problematic
since they were keeping a 30-day rolling "archive" of NEXRAD Level II
data.  The scouring in that case consisted of removing the oldest
day of Level II data which is saved in a hierarchy.  I found no significant
difference in my Tcl-based scouring and a simple 'rm -rf' done from the
top level directory underwhich all data lived.

From my perspective, the real problem is that Linux does not have a
simple 'unlink' command that works only on the inode table.

One thing you may want to do is open this up for discussion in the 'ldm-users'
email list.  Before you can post to any Unidata-maintained email list, however,
you must subscribe to the list (I note that you are not subscribed to
the 'ldm-users' list).

You can (un)subscribe to any Unidata-maintained email list online
at:

http://www.unidata.ucar.edu/content/support/mailinglist/mailing-list-form.html

If you decide to subscribe to repost your message, please wait until
you have received notification of being added to the list before your
post.

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: JGZ-326819
Department: Support LDM
Priority: Normal
Status: Closed