Dealing with large archives
Hi guys,
Firstly :
I have "solved" the problem with the bad characters. The problem is that
the NetCDF reader that thredds uses makes use itself of the "urlPath"
specification when coming back with the DDS and DAS. As such, if use the
"=" character (among others) in the urlPath (even if it's in the path
rather than the simple filename), it gets inserted into the DDS/DAS by
the NetCDF reader, which causes errors down the track in the parser.
I have worked around the problem by having a separate internalService
for each dataset. The "base" section can contain the illegal characters
without polluting the DDS/DAS of files read by the NetCDF reader. For
the moment this is fine, but is less than ideal. I may return to it
after dealing with more pressing issues. In future I will look at
encoding the illegal characters as escaped strings or encoded in some
way, but it's tricky to be sure that you've covered all of the cases
when thinking about those techniques.
Maybe once everything goes XML the problem will simply disappear, and I
can just wait it out :)
Secondly :
I am trying to work out how to structure my data by date. I will have a
number of data sets (NWP Models) which will get updated daily, or even
multiple times per day. Quite quickly I will reach the point where I
will have hundreds of data sets published. Even a week's worth of data
at 2 per day across 3 sources is 42 data sets.
I have two tasks - one would be to automate the updating of the
configuration files so that new data sets get incorporated as they
become available, and the other would be structuring the data pages in a
sensible way for users to access.
I was wondering what practises people might have adopted or found
successful in the past with regards to handling large amounts of data?
Have people typically arranged archive data as aggregations, or linked
to archive catalogs from the top-level catalog? What have people found best?
Cheers,
-Tennessee