Re: [thredds] Aggregating Large NetCDF Datasets with Restricted Access

To: "Antonio S. Cofiño" <antonio.cofino@xxxxxxxxx>
Subject: Re: [thredds] Aggregating Large NetCDF Datasets with Restricted Access
From: Kevin Manross <manross@xxxxxxxx>
Date: Fri, 20 Dec 2013 13:47:41 -0700


Thanks Antonio!

I'll definitely give this idea a shot.

Is there any performance hit if listing several thousand files in thecatalog (as opposed to scanning the directory)?


Thanks again!

-kevin.

On 12/20/13 1:34 PM, "Antonio S. Cofiño" wrote:

Kevin,
To improve the JoinExisting aggregation you can substitute the innerscan element by adding explicitly files (explicitly) you wantaggregate and add the ncoords or the coordValue attribute to thenetcdf element as it's been explained in the "Defining coordinates ona JoinExisting aggregation" section of the Aggregation document:http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/v2.2/Aggregation.html
Be sure the aggregation cache for the TDS config is configured:
http://www.unidata.ucar.edu/software/thredds/current/tds/tds4.3/reference/ThreddsConfigXMLFile.html#AggregationCache
I hope this help.

Regards

Antonio



--
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
        Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es

El 20/12/2013 19:28, Kevin Manross escribió:
Seasons Greetings!
I really wish we didn't have these restrictions on data, but that'swhat I'm dealing with so please bear with me.
We have some large (33 Tb, 840 Gb, etc) netCDF datasets that I amtrying to aggregate. Many are in "time series" layout (I.e., singleparameter grid spread out across many time steps [files], such asu10/u10_RCPP_2004_11.nc, u10/u10_RCPP_2004_12.nc, etc.)
I initially tried a large nested aggregations such as:

<dataset name="ds601.0-Agg"
ID="ds601.0-AGG"
      & nbsp;&nbs p; urlPath="ds601.0/10/best"
harvest="true">
<metadata inherited="true">
<serviceName>all</serviceName>
<dataFormat>NetCDF</dataFormat>
<dataType>GRID</dataType>
</metadata>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>

<aggregation type="Union">
<netcdf>
<aggregation dimName="time" type="joinExisting">
<scan location="/glade/p/rda/data/ds601.0/RCPP/1995_2005/glw/"suffix=".nc" subdirs="true"/>
             </aggregation>
</netcdf>
<netcdf>
<aggregation dimName="time" type="joinExisting">
&n bsp; ; <scanlocation="/glade/p/rda/data/ds601.0/RCPP/1995_2005/graupel/"suffix=".nc" subdirs="true"/>
</aggregation>
</netcdf>
<netcdf>
<aggregation dimName="time" type="joinExisting">
<scan location="/glade/p/rda/data/ds601.0/RCPP/1995_200 5/olr/"suffix=".nc" subdirs="true"/>
</aggregation>
</netcdf>
<netcdf>
          &nb sp; <aggregation dimName="time" type="joinExisting">
<scan location="/glade/p/rda/data/ds601.0/RCPP/1995_2005/psfc/"suffix=".nc" subdirs="true"/>
</aggregation>
</netcdf>
                      ...
                      ...
                      ...
</aggregation>
</netcdf>
</dataset>
This takes a long time to build the cache file, and upon each revisitit goes through the process of rebuilding the file. Honestly, it isunusable this way from a user standpoint. However, everything workswith the restrictions I have set up via Tomcat DataSourceRealm andwebapps/thredds/WEB-INF/web.xml
Mike McDonald had a really slick way to aggregate and cache theparameter timeseries files, and then build the union on demand. (seehis response to the thread '"Too Many Open Files" Error. Dataset toobig?' on 28 October 2013) . So using his example, I reformatted mycatalog as such:
       <dataset name="Full Aggregation of ds601.0"
         ID="ds601.0-AGG"
         urlPath="aggregations/ds601.0/10/best"
         harvest="true">
         <serviceName>all</serviceName>
<netcdfxmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
           <aggregation type="Union">
<netcdflocation="dods://localhost:8080/thredds/dodsC/internal/ds601.0/101/glw"/><netcdflocation="dods://localhost:8080/thredds/dodsC/internal/ds601.0/102/graupel"/><netcdflocation="dods://localhost:8080/thredds/dodsC/internal/ds601.0/103/olr"/><netcdflocation="dods://localhost:8080/thredds/dodsC/internal/ds601.0/104/psfc"/>
                ...
                ...
                ...
           </aggregation>
         </netcdf>
       </dataset>

       <dataset name="internal/ds601.0 Aggregation (glw)"
         ID="internal/ds601.0/101/glw"
         urlPath="internal/ds601.0/101/glw"

         harvest="true">
         <serviceName>all</serviceName>
<netcdfxmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
             <aggregation dimName="time" type="joinExisting">
<scanlocation="/data/glade/p/rda/data/ds601.0/RCPP/1995_2005/glw/"suffix=".nc" subdirs="true"/>
             </aggregation>
           </netcdf>
       </dataset>


       <dataset name="internal/ds601.0 Aggregation (graupel)"
         ID="internal/ds601.0/102/graupel"
         urlPath="internal/ds601.0/102/graupel"

         harvest="true">
         <serviceName>all</serviceName>
<netcdfxmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
             <aggregation dimName="time" type="joinExisting">
<scanlocation="/data/glade/p/rda/data/ds601.0/RCPP/1995_2005/graupel/"suffix=".nc" subdirs="true"/>
             </aggregation>
           </netcdf>
       </dataset>

       <dataset name="internal/ds601.0 Aggregation (olr)"
         ID="internal/ds601.0/103/olr"
         urlPath="internal/ds601.0/103/olr"

         harvest="true">
         <serviceName>all</serviceName>
<netcdfxmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
             <aggregation dimName="time" type="joinExisting">
<scanlocation="/data/glade/p/rda/data/ds601.0/RCPP/1995_2005/olr/"suffix=".nc" subdirs="true"/>
             </aggregation>
           </netcdf>
       </dataset>


       <dataset name="internal/ds601.0 Aggregation (psfc)"
         ID="internal/ds601.0/104/psfc"
         urlPath="internal/ds601.0/104/psfc"

         harvest="true">
         <serviceName>all</serviceName>
<netcdfxmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
             <aggregation dimName="time" type="joinExisting">
<scanlocation="/data/glade/p/rda/data/ds601.0/RCPP/1995_2005/psfc/"suffix=".nc" subdirs="true"/>
             </aggregation>
           </netcdf>
       </dataset>

        ...
        ...
        ...
This sped things up immensely and the server is very responsive,however, I can't seem to get the authorization to work with theinternal Union aggregation.
I have attempted a number of things, such as:
+https://www.unidata.ucar.edu/software/thredds/current/tds/reference/RestrictedAccess.html- 2. Restrict by Dataset using TDS Catalog
    for each joinExisting aggregation
+ Adding a valid username/password to the url in the netcdf locationvalue of the Union call
<aggregation type="Union">
<netcdflocation="dods://USERNAME:PASSWORD@localhost:8080/thredds/dodsC/internal/ds601.0/101/glw"/>
+ trying the above with an http:// protocol
The only thing that seems to work is to leave the the joinExistingaggregations unrestricted, but keep the restriction on the Unionaggregation.
I would like to do any of the following:

1) Hide the joinExisting aggregations (links) from the web browser
2) Since the joinExisting aggregations are only needed to populatethe Union aggregation "internally" to the TDS, somehow easerestrictions when called within the TDS on the localhost
3) Somehow authorize the joinExisting aggregations within the Uinionaggregation
4) Hear of an alternative way to efficiently aggregate the timeseriesparameters and then combine those aggregated timeseries.
If this is completely undo-able, that is also helpful information,and I'll leave the aggregated timeseries (joinExisting) unrestricted.
-kevin.

--
Kevin Manross
NCAR/CISL/Data Support Section
Phone: (303)-497-1218
Email:manross@xxxxxxxx <mailto:manross@xxxxxxxx>
Web:http://rda.ucar.edu


_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:http://www.unidata.ucar.edu/mailing_lists/


--
Kevin Manross
NCAR/CISL/Data Support Section
Phone: (303)-497-1218
Email:manross@xxxxxxxx <mailto:manross@xxxxxxxx>
Web:http://rda.ucar.edu

Follow-Ups:
- Re: [thredds] Aggregating Large NetCDF Datasets with Restricted Access
  - From: "Antonio S. Cofiño"

References:
- [thredds] Aggregating Large NetCDF Datasets with Restricted Access
  - From: Kevin Manross
- Re: [thredds] Aggregating Large NetCDF Datasets with Restricted Access
  - From: "Antonio S. Cofiño"

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: