Hi Dan:
dan.swank wrote:
This will be a challenge for sure.
The NARR, for example, will be an aggregation of ~75000 grib files.
Stored in a basic ./YYYYMM/YYYYMMDD tree. The recursive datasetScan
tag added recently helps a ton with this. Some of our datasets have
forecast hours, some don't. Doing n forecast hour aggregation across
the 00hr will help termendously with all of them, however.
While it works wonderfully for NetCDF, I cannot see the NcML agg.
working with this set of data ~
mainly due to the changing reference times.
I think the FMRC will probably solve it. However, a 75,000 file
aggregation will be a challenge. Im actually pretty sure we can solve it
(with enough server memory!) but it does worry me that with a single
dods call, someone could make a request that requires opening 75,0000
files to satisfy. OTOH, if thats the service you want to provide, it
sure is a lot better doing it on the server!!! Any thoughts?
Looking at the NARR data:
- it looks like you have them divided by day, then all for the same month.
- it looks like all the time coordinates are either 0 or 3 hour
offsets from run time.
- whats the difference bewteen narr_a and narr_b? Should they be
combined or kept seperate?
- i assume new files are added now and then? how often? ever deleted?
According to NCEP, our NAM & GFS will soon be foreced into GRIB2.
But NCDC-NOMADS NWP it currently entirely a GRIB-1 archive.
Only recently home-grown NCDC datasets are created in NetCDF.
For NAM & GFS, we have about 6 months online, which comes out to
about 700 file when stripped to a 1 forecast time
(say 00hr) aggregation. But there are 61 forecast times for GFS, and 21
for NAM.
Do you store each hour seperately, or are all the forecast hours for a
run in the same file?
-Dan
Especially for GRIB files, you likely need this new "Forecast Model
Run Collection Aggregation" capability. We have been working with our
IDD NCEP GRIB files, and there are some complications, especially
non-homogenaity due to missing records and variable time and vertical
dimensions, that cant really be solved by the current (index based)
aggregation.
Ethan and I will work closely with you guys to get this working. I'd
like to understand what you have in more detail, number and types of
files, how they are stored, etc. Can you or someone summarize?
John