Dear All, We have a dataset with gridded data and multiple parameters of in-situ measurements (temperature, salinity, oxygen, etc). The issue with this dataset is that I there are very few timesteps, if any, where we have all the parameters together. We want the main end product to be the entire dataset as a thredds dataset, mostly for the opendap and wms capabilities. We're trying to decide on how to best store these data, for both efficient storage and retrieval of the data. Would the most efficient, for reading in Thredds, be one huge netcdf file? That seems inefficient storage-wise, and it certainly is inefficient if we want to add new data (both "new" historical data, and new-new data). Could we make it efficient(for thredds to retrieve tha data) when we split the netcdf up into smaller files, either on each parameter, or timesteps, or both, and then omit some variables on the timesteps where there is no data for them? The optimal would be a union of joinExisting, where each file has only one variable and one timestep. >From what we can see from the documentation/our testing, this seems hard. If >we were using a featurecollection, it could be solved with setting the >"correct" prototype, but the featurecollection has no "grid" featureType - >why? :( The FMRC featuretype seems to process the files much more efficient than an ncml aggregation. To use FMRC is not an option, however, as we can not modify basic things like name/summary/id/add other variables of the "best" dataset (and preferably remove the time_run variable). For me, it does sound easy enough to provide this functionality in TDS. The first test files we've used here contain one timestep, with all variables (even where all values for some are nan - this seems superfluous, and should be possible to omit). We have these two test datasets: <dataset name="Aggregation_ncml" ID="aggr_ncml" serviceName="all" dataType="Grid" urlPath="aggr_ncml"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <aggregation dimName="time" type="joinExisting" recheckEvery="15 min" > <scan location="/vagrant/shared/thredds/Test-1/" suffix=".nc" subdirs="false"/> </aggregation> </netcdf> </dataset> <featureCollection name="test-fmrc" featureType="FMRC" harvest="true" path="fmrc/test"> <collection spec="/vagrant/shared/thredds/Test-1/UL-5-#yyyy-MM-dd#.nc" recheckAfter="10 sec" olderThan="1 min"/> <update startup="true" rescan="0 5 3 * * ? *" /> <protoDataset choice="Penultimate" change="0 2 3 * * ? *" /> <fmrcConfig regularize="true" datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" /> </featureCollection> The ncml aggregation is SLOW(uses 1-5 minutes to produce one single wms layer in godiva)! While the fmrc collection is quite fast (e.g. "Best" uses under a minute to process a yearly resoluted wms-animation over 139 years). For processing individual wms-requests (outside of godiva), its working much faster (5-10 seconds for ncml), but the ncml-agg still takes 10-100 times longer than fmrc. I assume this has to do with the way the two methods index the files/data. We also tried to use dateFormatMark="UL-5-#yyyy-MM-dd" on the ncml aggregation in hope it would improve the indexing, but the results was the same. If anyone have any advice on how to optimize our datasets for Thredds, that would be fantastic. Many thanks, Aleksander Vines
Attachment:
smime.p7s
Description: Electronic Signature S/MIME
thredds
archives: