Re: [thredds] "Too Many Open Files" Error. Dataset too big?

To: Kevin Manross <manross@xxxxxxxx>
Subject: Re: [thredds] "Too Many Open Files" Error. Dataset too big?
From: Michael McDonald <mcdonald@xxxxxxxxxxxxx>
Date: Mon, 28 Oct 2013 13:48:19 -0400

Kevin,

> I have been triggering this initial scan by clicking on the services for the
> aggregated dataset.  Is there another way to perform the initial indexing of
> netCDF aggregations (like is done with GRIB Collections) besides clicking on
> a service link?

We trigger all of our initial catalog scans via ongoing Nagios
(http://www.nagios.org/) queries that check the most frequently
accessed datasets (really only need to query the datasets that change,
i.e., forecast datasets, and the large aggregations). We set the
Nagios queries to extremely high timeout values (5~10minutes) and then
just let them run normally. We occasionally get false-positives from
this when the tomcat server is reset/synchronized on a daily basis.
All of the other misc datasets will be triggered by the users when
requested. However, these misc/smaller datasets are usually quick to
scan/generate on the fly. All of your static datasets should have the
"recheckEvery" value *excluded* from its catalog file. Therefore, once
the cache/agg file is created it will only be removed when the
NetcdfFileCache scour value elapses. This is a tricky balance to get
right. We are still trying to fine tune this on our servers.


> Also, I assume that the scouring of NetcdfFileCache would not remove this
> index file from cache/agg, correct?  Otherwise users would be in for a long
> wait each time they click on an aggregated service.  According to
> http://www.unidata.ucar.edu/software/thredds/current/tds/reference/ThreddsConfigXMLFile.html,
> the cache/agg dir is only for joinExisting.  I'm trying to use Union right
> now.

Assume that anything in the cache/agg folder is game for
removal/scrub. "everything/anything" in cache/agg older than the scour
value will be deleted! We were testing out a btsync between our two
thredds servers and this tomcat scour was deleting dot-files/folders
unrelated to thredds. So we now do our sync one directory level higher
"cache" and exclude all directories but the "agg" folder.

If your dataset does not change, and you want it to be cached for a
while - avoiding the initial scan, then you need to set the
NetcdfFileCache scour value to multiple days. Make sure you have
plenty of disk space for the cache/agg folder, since all other
datasets will now be cached for much longer. However, all of our
catalogs in cache/agg typically occupy less than 25MB of space. The
real cache consumer is NCSS (a separate scour value/schedule)!

I don't think unions are stored in cache/agg. Best test is to look in
this folder for a file resembling the dataset name. Inspect the file
and note its size, timestamp, and contents. Nearly all of our
aggregations are nested joinExisting(like variables)+union(top). I see
all of the joinExisting cache files in this cache/agg folder, but zero
files with the "union" type.

Are you sure you should be performing a union on this dataset and not
using joinExisting (time series data) instead? What we do is many
small/manageable joinExisting scans of like data. Then we do a union
at the top level of these netcdf datasets. This way all of the
components get cached and then the top level union is simply a
combination of the cached data (see latest.xml attachment). This idea
was in one of the advanced thredds examples (or on the forum) and it
has helped significantly reduce our initial scan times.

/mike

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Catalog"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0";
         xmlns:xlink="http://www.w3.org/1999/xlink";>

<service name="all" serviceType="Compound" base="">
        <service name="ncdods" 
                serviceType="OPeNDAP"
                base="http://tds.hycom.org/thredds/dodsC/"/>
        <service name="ncss" 
                serviceType="NetcdfSubset" 
                base="http://ncss.hycom.org/thredds/ncss/grid/"/>
        <service name="wms" 
                serviceType="WMS" 
                base="http://wms.hycom.org/thredds/wms/"/>
        <service name="wcs" 
                serviceType="WCS" 
                base="http://wcs.hycom.org/thredds/wcs/"/>
        <service name="ftp"
                serviceType="FTP"
                base="ftp://ftp.hycom.org/datasets/"/>
</service>

<dataset name="HYCOM + NCODA Global 1/12 Degree Analysis"> <!-- Description -->
<!-- 
##########
## BASE ## 
===============================================================================
##########
-->
<dataset name="GLBa0.08 (latest 10 days)"> <!-- Model/Experiment -->
        <metadata inherited="true">
        <authority>edu.ucar.unidata</authority>
        <dataType>Grid</dataType>
        <dataFormat>NetCDF</dataFormat>
        <documentation type="rights"> Freely available </documentation>
        <documentation 
xlink:href="http://www.hycom.org/dataserver/glb-analysis"; 
                                   xlink:title="GLBa0.08 Documentation" />
        <creator>
                <name vocabulary="DIF">NRL</name>
                <contact url="http://www.hycom.org/"; email="forum@xxxxxxxxx" />
        </creator>
        <publisher>
                <name vocabulary="DIF">HYCOM.org</name>
                <contact url="http://www.coaps.fsu.edu"; email="forum@xxxxxxxxx" 
/>
        </publisher>
        </metadata>
<!-- 
############
## LATEST ## 
=============================================================================
############
-->
<dataset name="All Latest Data at 00Z" 
        ID="GLBa0.08/latest" urlPath="GLBa0.08/latest">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <aggregation type="union">
                        <netcdf 
location="dods://localhost/thredds/dodsC/GLBa0.08/latest/2d"/>
                        <netcdf 
location="dods://localhost/thredds/dodsC/GLBa0.08/latest/salt"/>
                        <netcdf 
location="dods://localhost/thredds/dodsC/GLBa0.08/latest/temp"/>
                        <netcdf 
location="dods://localhost/thredds/dodsC/GLBa0.08/latest/uvel"/>
                        <netcdf 
location="dods://localhost/thredds/dodsC/GLBa0.08/latest/vvel"/>
                </aggregation>
        </netcdf>
</dataset>

<dataset name="Latest Data at 00Z (2d)" 
        ID="GLBa0.08/latest/2d" urlPath="GLBa0.08/latest/2d">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <variable name="ssh">
                        <remove type="attribute" name="valid_range"/>
                </variable>
                <aggregation dimName="MT" type="joinExisting" recheckEvery="20 
minutes">
                        <scan 
location="/hycom/ftp/datasets/GLBa0.08/latest/data/2d/" 
                                suffix=".????_???_00_2d.nc" subdirs="false" />
                </aggregation>
        </netcdf>
</dataset> <!-- variable -->

<dataset name="Latest Data at 00Z (salt)" 
        ID="GLBa0.08/latest/salt" urlPath="GLBa0.08/latest/salt">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <variable name="salinity">
                        <remove type="attribute" name="valid_range"/>
                </variable>
                <aggregation dimName="MT" type="joinExisting" recheckEvery="20 
minutes">
                        <scan 
location="/hycom/ftp/datasets/GLBa0.08/latest/data/salt/" 
                                suffix="00_3zs.nc" subdirs="false" />
                </aggregation>
        </netcdf>
</dataset> <!-- variable -->

<dataset name="Latest Data at 00Z (temp)" 
        ID="GLBa0.08/latest/temp" urlPath="GLBa0.08/latest/temp">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <variable name="temperature">
                        <remove type="attribute" name="valid_range"/>
                </variable>
                <aggregation dimName="MT" type="joinExisting" recheckEvery="20 
minutes">
                        <scan 
location="/hycom/ftp/datasets/GLBa0.08/latest/data/temp/" 
                                suffix="00_3zt.nc" subdirs="false" />
                </aggregation>
        </netcdf>
</dataset> <!-- variable -->

<dataset name="Latest Data at 00Z (uvel)" 
        ID="GLBa0.08/latest/uvel" urlPath="GLBa0.08/latest/uvel">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <variable name="u">
                        <remove type="attribute" name="valid_range"/>
                </variable>
                <aggregation dimName="MT" type="joinExisting" recheckEvery="20 
minutes">
                        <scan 
location="/hycom/ftp/datasets/GLBa0.08/latest/data/uvel/" 
                                suffix="00_3zu.nc" subdirs="false" />
                </aggregation>
        </netcdf>
</dataset> <!-- variable -->

<dataset name="Latest Data at 00Z (vvel)" 
        ID="GLBa0.08/latest/vvel" urlPath="GLBa0.08/latest/vvel">
        <serviceName>all</serviceName>
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
                <variable name="v">
                        <remove type="attribute" name="valid_range"/>
                </variable>
                <aggregation dimName="MT" type="joinExisting" recheckEvery="20 
minutes">
                        <scan 
location="/hycom/ftp/datasets/GLBa0.08/latest/data/vvel/" 
                                suffix="00_3zv.nc" subdirs="false" />
                </aggregation>
        </netcdf>
</dataset> <!-- variable -->

<!-- 
##########
## END ## 
===============================================================================
##########
-->
</dataset> <!-- Model/Description -->
</dataset> <!-- Description -->
</catalog>

Follow-Ups:
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Kevin Manross

References:
- [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Kevin Manross
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Michael McDonald
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Kevin Manross
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: John Caron
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Michael McDonald
- Re: [thredds] "Too Many Open Files" Error. Dataset too big?
  - From: Kevin Manross

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: