Hi Guys:
In our TDS, we are beginning to get some datasets with a very large
number of files being aggregated, and recently we have noticed some
dramatic slowdowns. We also made other changes, so the problem may
well be something we did (we are trying to determine this), but in
the meantime we had some questions relating to aggregation (in this
case through time) and the datascan and updates.
1. Is there a priori reason to believe that with a large number of
files access/aggregation will be faster by having them in
subdirectories?
2. If we had subdirectories, we had a question about how the datascan
works when it updates. Ideally we might want to have subdirectories
be something like year/month sets of files, with the most recent data
put into the top level of the directory. What this would mean is
that not only would files be added, but at the end of the month some
of the files will be moved to a new subdirectory. Will the update as
written deal with this properly, or are we likely to break it or get
weird results. (I hope I have said this clearly - the key point is
that a some time there would be a new subdirectory added and files
moved out of the top level to that subdirectory, but the over
aggregation would be the same).
And insights you could provide would be appreciated.
-Roy M.
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."