John Caron wrote the following on 8/16/2006 3:39 PM:
> Hi Dan:
>
> dan.swank wrote:
>
>> This will be a challenge for sure.
>> The NARR, for example, will be an aggregation of ~75000 grib files.
>> Stored in a basic ./YYYYMM/YYYYMMDD tree. The recursive datasetScan
>> tag added recently helps a ton with this. Some of our datasets have
>> forecast hours, some don't. Doing n forecast hour aggregation across
>> the 00hr will help termendously with all of them, however.
>> While it works wonderfully for NetCDF, I cannot see the NcML agg.
>> working with this set of data ~
>> mainly due to the changing reference times.
>>
>>
> I think the FMRC will probably solve it. However, a 75,000 file
> aggregation will be a challenge. Im actually pretty sure we can solve it
> (with enough server memory!) but it does worry me that with a single
> dods call, someone could make a request that requires opening 75,0000
> files to satisfy. OTOH, if thats the service you want to provide, it
> sure is a lot better doing it on the server!!! Any thoughts?
Throttles... If the dev team could create an element to specify
the maximum size of a request in either bytes returned or
number of files accessed, that would be great.
>
> Looking at the NARR data:
> - it looks like you have them divided by day, then all for the same month.
> - it looks like all the time coordinates are either 0 or 3 hour offsets
> from run time.
The NARR is a reanalysis, as it contains variables
defined at instantaneos initial time,
or a 0 to 3 hour average/total/ or other operation.
> - whats the difference bewteen narr_a and narr_b? Should they be
> combined or kept seperate?
The differences are explained here:
http://nomads.ncdc.noaa.gov/data.php?name=narrdiffs
> - i assume new files are added now and then? how often? ever deleted?
New NARR comes in from NCEP on an irregular basis. Typeically,
this is on a once a month or less frequency. This archive is set to
grow indefinately, the files are never deleted.
>
>> According to NCEP, our NAM & GFS will soon be foreced into GRIB2.
>> But NCDC-NOMADS NWP it currently entirely a GRIB-1 archive.
>> Only recently home-grown NCDC datasets are created in NetCDF.
>>
>> For NAM & GFS, we have about 6 months online, which comes out to
>> about 700 file when stripped to a 1 forecast time
>> (say 00hr) aggregation. But there are 61 forecast times for GFS, and 21
>> for NAM.
>>
>>
> Do you store each hour seperately, or are all the forecast hours for a
> run in the same file?
We store them in a one file per forecast hour, which contains all
parameters and vertical levels for that forecast hour.
-Dan