Hi Bob:
Bob Simons wrote:
*** Issue #1
Our copy of thredds uses netcdf-2.2.17.jar.
Our copy of thredds crashed yesterday. From the logs, we see it was
throwing lots (several within a second) of this error (note
"getTypicalDataset"):
2006-10-31T04:46:44.073 -0800 [1799565807][ 3743344] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:326)
at ucar.nc2.ncml.Aggregation.getTypicalDataset(Aggregation.java:613)
at ucar.nc2.ncml.Aggregation.aggExistingDimension(Aggregation.java:630)
at ucar.nc2.ncml.Aggregation.finish(Aggregation.java:418)
at ucar.nc2.ncml.NcMLReader.readNetcdf(NcMLReader.java:352)
at
thredds.servlet.DatasetHandler$NcmlFileFactory.open(DatasetHandler.java:146)
at ucar.nc2.NetcdfFileCache.acquire(NetcdfFileCache.java:182)
at
thredds.servlet.DatasetHandler.getNcmlDataset(DatasetHandler.java:125)
at thredds.servlet.DatasetHandler.getNetcdfFile(DatasetHandler.java:57)
at dods.servers.netcdf.NcDODSServlet.getDataset(NcDODSServlet.java:355)
at dods.servlet.DODSServlet.doGetDAS(DODSServlet.java:492)
at dods.servlet.DODSServlet.doGet(DODSServlet.java:1451)
at dods.servers.netcdf.NcDODSServlet.doGet(NcDODSServlet.java:274)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237)
...
and then this series of errors:
2006-10-31T04:46:44.174 -0800 [1799565908][ 3743364] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.176 -0800 [1799565910][ 3743365] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.178 -0800 [1799565912][ 3743366] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.180 -0800 [1799565914][ 3743368] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.206 -0800 [1799565940][ 3743377] ERROR -
dods.servlet.DODSServlet - DODSServlet.anyExceptionHandler
java.lang.NullPointerException
2006-10-31T04:46:44.207 -0800 [1799565941][ 3743367] ERROR -
dods.servers.netcdf.GuardedDatasetImpl - GuardedDatasetImpl close
java.io.FileNotFoundException:
/u00/sys/opt/jakarta-tomcat-5.0.28/content/thredds/cacheAged/gov.noaa.pfel.coastwatchsatellite-MY-chla-14day
(Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at ucar.nc2.ncml.Aggregation.persistWrite(Aggregation.java:269)
at ucar.nc2.ncml.Aggregation.persist(Aggregation.java:237)
at ucar.nc2.dataset.NetcdfDataset.close(NetcdfDataset.java:483)
at
dods.servers.netcdf.GuardedDatasetImpl.close(GuardedDatasetImpl.java:68)
at
dods.servers.netcdf.GuardedDatasetImpl.release(GuardedDatasetImpl.java:63)
at dods.servlet.DODSServlet.doGetDAS(DODSServlet.java:504)
at dods.servlet.DODSServlet.doGet(DODSServlet.java:1451)
at dods.servers.netcdf.NcDODSServlet.doGet(NcDODSServlet.java:274)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237)
At that point, our thredds server became unstable and generally wasn't
working.
In tracking down the problem, I note that your code in netcdf-2.2.17.jar
for ucar.nc2.ncml.Aggregation has the getTypicalDataset method:
protected Dataset getTypicalDataset() throws IOException {
//if (typical != null)
// return typical;
int n = nestedDatasets.size();
if (n == 0) return null;
// pick a random one, but not the last
int select = (n < 2) ? 0 : Math.abs(new Random().nextInt()) % (n-1);
return (Dataset) nestedDatasets.get(select);
}
I don't know if that is the most recent code. It is from the copy of
netcdf-1.1.17.jar downloaded from your website today. The line numbers
of the error messages don't line up with the line numbers in the .java
files.
we are working to allow svn access soon, and you can then check out the version
that matches your jar.
The code for "int select =" looks like it should work, and always return
a value of 0, or 0 to n-1. But the error message at the very top
(java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:326)
at ucar.nc2.ncml.Aggregation.getTypicalDataset(Aggregation.java:613)
)
indicates that select is probably being set to -1.
I note that Math.abs(Integer.MIN_VALUE) returns a negative number
(-2147483648). I think that your code to generate 'select' then
generates a negative number
Math.abs(new Random().nextInt()) % (n-1)
For example, if n = 4, the result is -2;
But I couldn't find any n which would generate a select value of -1 so I
can't explain the reference to -1 in the error message. So perhaps this
isn't the problem, but it is suspicious.
As a solution:
I think your use of
Math.abs(new Random().nextInt()) % (n-1)
is not recommended practice (see
http://java.sun.com/developer/JDCTechTips/2001/tt0925.html). Instead,
you can use
new Random().nextInt(n - 1)
which is simpler, never generates a negative number, and generates a
more random random number.
that is definitely a bug, and one i wouldnt have found. much thanks
apparently it returns in range [0,n), so i changed to new Random().nextInt(n).
**** Issue #2
The exception which actually seems to cause thredds to fail is "(Too
many open files)" (see above) which follows right after the other errors.
I suspect, but can't prove, that this is related to the problem I
mentioned before about your code for NetcdfFile (and related File
classes) not using a finalize method to ensure that the underlying File
object is closed. I don't know if you have changed your code to use a
finalize method. Your last email on the subject (8/22/2006 3:35 PM)
didn't say if you were or weren't going to do it. Could you please do
it? It seems like good insurance.
yes, ive decided it is worth doing, and will be in the next release.
there is probably a deeper problem, as to where the files are not getting
closed, but i agree this is good insurance.
im going to try to fix a few more bugs, ill send you an email when i have a new
version.
thanks again
*****
I could be wrong about these things, but that is my best guess.
Thanks for looking into this.
Sincerely,
Bob Simons
Satellite Data Product Manager
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
(831)648-0272
bob.simons@xxxxxxxx
<>< <>< <>< <>< <>< <>< <>< <>< <><