Re: [thredds] ncWMS GetCapabilities return sometimes empty document

Hi Jon, Heiko,

Besides the threads, Tomcat also reuses the underlying response objects
(org.apache.catalina.connector.Response which wraps a
org.apache.coyote.Response). When I catch the request that will fail
before any TDS or ncWMS code is even reached, the associated response
object returns true on a call to isCommit(). My guess is that the
ClientAbortException comes at a time that somehow misses or interrupts
the Response.recycle() calls in a way that leaves it committed when it
is next used.

I've put digging into this on hold while I work on switching the TDS to
use the new ncWMS code. I'm planning on getting back to this problem
sometime next week. I'll revisit the Jetty side of things then as well.
My impression there was that it wasn't a cross-request issue. Though
perhaps it is related and will help illuminate the problem we are seeing
in Tomcat as well.

Ethan

On 3/29/2010 6:16 AM, Heiko Klein wrote:
> Hi Jon,
> 
> the invalid thread is just a guess, but it is the only thing I can think of.
> 
> a) I tried to read thousands of getCapabilites with a loop, all without
> a single failure.
> b) I tried to read hundreds of getMap with a loop, all without a single
> failure.
> c) When I abort a 'getMap' request, shortly after I get an Exception in
> a 'getCapabilities' request.
> 
> Ethan verified this, so the effect is reproducable.
> 
> Since http is stateless, a cross-request Exception should only be caused
> by the information in the session or in the servlet container. Sessions
> should be ruled out, since the 'GET' command doesn't send the jsessionid
> cookie.
> 
> So it must be something which is reused in the servlet container.
> Threads are reused? Ah, and one thing I didn't think about previously:
> beans stored in ServletContext?
> 
> 
> 
> Since jetty raises an exception immediately in the aborted 'getMap'
> request, it might be easier in this case to debug with jetty.
> 
> Heiko
> 
> 
> On 2010-03-26 16:24, Jonathan Blower wrote:
>> Hi Heiko,
>>
>> I'm still puzzled by this problem.  But also, I don't understand how
>> Tomcat manages threads etc internally.  Could you please explain why you
>> think that this is caused by a thread being left in an invalid state?
>>
>> Cheers, Jon
>>
>> -----Original Message-----
>> From: Heiko Klein [mailto:Heiko.Klein@xxxxxx] 
>> Sent: 19 March 2010 08:01
>> To: Ethan Davis
>> Cc: thredds@xxxxxxxxxxxxxxxx; Jonathan Blower
>> Subject: Re: [thredds] ncWMS GetCapabilities return sometimes empty
>> document
>>
>> Hi Ethan,
>>
>> I just tried the new snapshot.
>>
>> In jetty, the previous exception disappeared, but I get a strange new
>> one when aborting a GetMap request:
>>
>> 2010-03-19 08:43:07.328:WARN::header full:
>> java.lang.NullPointerException
>>
>> and no stack-trace here. Strange, because NullPointerExceptions are
>> normally SEVERE?
>>
>>
>> In tomcat, the situation hasn't changed. Still an empty GetCapabilities
>> document shortly after a GetMap request has been canceled. And the same
>> old 'SEVERE: Exception initializing page context' log.
>>
>> Heiko
>>
>> On 2010-03-19 04:03, Ethan Davis wrote:
>>> Hi Heiko,
>>>
>>> I believe this is a separate problem that happens when an exception is
>>> thrown after the GetMap response has been committed and the exception
>>> gets caught by a catch blocks that calls response.sendError(...)
>> without
>>> first checking if response.isCommitted().
>>>
>>> I've added a check to all the places we didn't have it. I made a
>>> snapshot release with those changes if you want to try it again. The
>>> snapshot is at
>>>
>>> ftp://ftp.unidata.ucar.edu/pub/thredds/4.1/Snapshot/thredds.war
>>>
>>> On the other hand, I'm surprised we haven't seen this before since, if
>>> I'm right, this could happen in Tomcat as well. Maybe the time after
>> the
>>> response is committed is very short so we just haven't hit it before.
>> Or
>>> we just haven't noticed the log messages. If you see it with the
>>> snapshot release, we'll know my interpretation is incorrect.
>>>
>>> Ethan
>>>
>>>
>>> On 3/18/2010 3:56 AM, Heiko Klein wrote:
>>>> Hi,
>>>>
>>>> I'm not familiar enough with the thredds or ncWMS code to be of help
>>>> there. But I just made another test which might be of help:
>>>>
>>>> I tried thredds4.1.3 on jetty 6.1.22. Here, the exception is thrown
>>>> directly during the client-abort. And it doesn't seem to be forwarded
>> to
>>>> GetCapabilities.
>>>>
>>>> Exception from the jetty log when aborting a GetMap request ( GET
>>>>
>> 'http://localhost:8080/thredds/wms/testAll/2004050412_eta_211.nc?REQUEST
>> =GetMap&LAYERS=Z_sfc&PALETTE=redblue&SERVICE=WMS&FORMAT=image/png&VERSIO
>> N=1.3.0&CRS=EPSG:4326&BBOX=-130,30,20,90&WIDTH=400&HEIGHT=400&STYLES=BOX
>> FILL/ncview'
>>>>> /dev/null ) :
>>>>
>>>> 2010-03-18 10:53:21.207:WARN::Committed before 500 null
>>>> 2010-03-18
>> 10:53:21.208:WARN::/thredds/wms/testAll/2004050412_eta_211.nc
>>>> java.lang.IllegalStateException: Committed
>>>>    at org.mortbay.jetty.Response.resetBuffer(Response.java:1024)
>>>>    at org.mortbay.jetty.Response.sendError(Response.java:240)
>>>>    at org.mortbay.jetty.Response.sendError(Response.java:342)
>>>>    at
>>>>
>> thredds.server.wms.WMSController.handleRequestInternal(WMSController.jav
>> a:304)
>>>>    at
>>>>
>> org.springframework.web.servlet.mvc.AbstractController.handleRequest(Abs
>> tractController.java:153)
>>>>    at
>>>>
>> org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handl
>> e(SimpleControllerHandlerAdapter.java:48)
>>>>    at
>>>>
>> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherS
>> ervlet.java:875)
>>>>    at
>>>>
>> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSe
>> rvlet.java:809)
>>>>    at
>>>>
>> org.springframework.web.servlet.FrameworkServlet.processRequest(Framewor
>> kServlet.java:571)
>>>>    at
>>>>
>> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.
>> java:501)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>>>    at
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>>>>    at
>>>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
>> dler.java:1166)
>>>>    at
>>>>
>> thredds.servlet.filter.RequestPathFilter.doFilter(RequestPathFilter.java
>> :105)
>>>>    at
>>>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
>> dler.java:1157)
>>>>    at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
>>>>    at
>>>>
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
>> 16)
>>>>    at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>>>    at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>>>>    at
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
>>>>    at
>>>>
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
>> Collection.java:230)
>>>>    at
>>>>
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
>> a:114)
>>>>    at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>>>    at org.mortbay.jetty.Server.handle(Server.java:326)
>>>>    at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>>>    at
>>>>
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
>> ction.java:923)
>>>>    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
>>>>    at
>> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>>>    at
>> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>>>    at
>>>>
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:
>> 409)
>>>>    at
>>>>
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java
>> :582)
>>>>
>>>>
>>>>
>>>>
>>>> Jetty was freshly started, I made only one request to the jetty. I
>>>> aborted the request before it was finished.
>>>>
>>>> Best regards,
>>>>
>>>> Heiko
>>>>
>>>>
>>>>
>>>> On 2010-03-18 06:46, Ethan Davis wrote:
>>>>> Hi again,
>>>>>
>>>>> OK, I can reproduce the problem reliably. It wasn't very reliable
>> when I
>>>>> was doing it by hand. Now I'm running a script that continually
>> makes
>>>>> GetCapabilities requests one after the other. While that script is
>>>>> running I make GetMap requests and abort them with a ctrl-C. I also
>> have
>>>>> Tomcat configured to just 3 threads. (Thanks Heiko for figuring all
>> that
>>>>> out! Works like a charm.)
>>>>>
>>>>> I'm catching the GetCapabilities request that is going to fail with
>> a
>>>>> conditional breakpoint ("response.isCommitted() == true") at the
>>>>> beginning of the WMSController.handleRequestInternal() method. As we
>>>>> might expect, it looks like the response is committed from the time
>> the
>>>>> org.apache.coyote.http11.Http11Processor creates the request and
>>>>> response objects.
>>>>>
>>>>> I haven't figured out how to reliably catch the problem during the
>> abort
>>>>> of the GetMap request. Seems like a Tomcat issue rather than a
>> TDS/WMS
>>>>> issue. I would think Tomcat should handle things even if we are
>> dealing
>>>>> with an exception poorly.
>>>>>
>>>>> I'll dig around a bit more tomorrow. See if I can figure out why the
>>>>> response doesn't get reset properly.
>>>>>
>>>>> Ethan
>>>>>
>>>>>
>>>>> On 3/17/2010 11:03 AM, Ethan Davis wrote:
>>>>>> Hi Jon, Heiko,
>>>>>>
>>>>>> Jon, forgot to mention, we have code in the TDS WMS controller for
>>>>>> catching SocketException-s and ClientAbortException/IOException-s
>>>>>> similar to the change you reference.
>>>>>>
>>>>>> Have either of you seen this with ncWMS? Or is it so far just
>> showing up
>>>>>> in TDS?
>>>>>>
>>>>>> Most of the references to this exception that I've found deal with
>>>>>> exceptions interrupting JSP processing and causing a switch to a
>> JSP
>>>>>> error page that then can't create a session object. Since the
>> problem
>>>>>> happens in one of our main JSP pages (capabilities_xml.jsp) and the
>>>>>> ClientAbortException-s are happening in a different thread, I'm
>> leaning
>>>>>> towards the "thread in an invalid state" idea as well.
>>>>>>
>>>>>> One thing we could try is to set our JSP pages to not create
>> session
>>>>>> objects: "<%@ page session='false' ...%>". But since we don't know
>>>>>> if/how the threads might be invalid, I'm afraid that would just
>> make
>>>>>> this problem even harder to see.
>>>>>>
>>>>>> I'm going to try for a bit to get things setup so I can debug into
>>>>>> Tomcat code and see if I can catch this in action.
>>>>>>
>>>>>> Ethan
>>>>>>
>>>>>> On 3/17/2010 1:24 AM, Ethan Davis wrote:
>>>>>>> Hi Jon, Heiko,
>>>>>>>
>>>>>>> I'm having a hard time reproducing the problem reliably. I've only
>> seen
>>>>>>> it twice so far. And haven't been able to catch it in action. I
>> did get
>>>>>>> the IllegalStateException from the Jasper JspFactoryImpl both
>> times:
>>>>>>>
>>>>>>>   "java.lang.IllegalStateException: Cannot create a session
>>>>>>>   after the response has been committed"
>>>>>>>
>>>>>>> Googling on the above exception message brings up a fair number of
>> hits
>>>>>>> but none of them seem totally relevant. Or when they sound
>> somewhat
>>>>>>> relevant, there isn't a solution.
>>>>>>>
>>>>>>> More tomorrow.
>>>>>>>
>>>>>>> Ethan
>>>>>>>
>>>>>>> On 3/16/2010 9:05 AM, Jonathan Blower wrote:
>>>>>>>> Hi Heiko, Ethan,
>>>>>>>>
>>>>>>>> I've been trying to figure out what might be going on here -
>> thanks for
>>>>>>>> your great diagnosis Heiko.  I'm not sure what can be done in the
>>>>>>>> THREDDS/WMS code to prevent this, but I've modified some code in
>> the
>>>>>>>> ncWMS WmsController class to attempt to catch exceptions that are
>> thrown
>>>>>>>> when clients disconnect:
>>>>>>>>
>>>>>>>>
>> http://www.resc.rdg.ac.uk/trac/ncWMS/browser/branches/tds-refactor-2010-
>>>>>>>>
>> 01-09/src/java/uk/ac/rdg/resc/ncwms/controller/WmsController.java#L250
>>>>>>>>
>>>>>>>> Heiko - you mentioned that it's possible that poor exception
>> handling
>>>>>>>> could cause a tomcat thread to be left in an invalid state.  I
>> can't
>>>>>>>> reproduce your problem reliably so I can't test this myself, but
>> I
>>>>>>>> wonder if you could find out whether this is happening in your
>> setup?
>>>>>>>> The VisualVM tool
>>>>>>>> (http://java.sun.com/javase/6/docs/technotes/guides/visualvm/)
>> can give
>>>>>>>> a thread dump from your running Tomcat system - perhaps you could
>> run
>>>>>>>> this when an error occurs and see if anything is suspicious?
>>>>>>>>
>>>>>>>> Ethan - are you able to reproduce Heiko's problem?
>>>>>>>>
>>>>>>>> Cheers, Jon
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ethan Davis [mailto:edavis@xxxxxxxxxxxxxxxx] 
>>>>>>>> Sent: 05 March 2010 23:32
>>>>>>>> To: Heiko Klein
>>>>>>>> Cc: Jonathan Blower; thredds@xxxxxxxxxxxxxxxx
>>>>>>>> Subject: Re: [thredds] ncWMS GetCapabilities return sometimes
>> empty
>>>>>>>> document
>>>>>>>>
>>>>>>>> Hi Heiko,
>>>>>>>>
>>>>>>>> That is great. Thanks.
>>>>>>>>
>>>>>>>> I'm a bit swamped through next week but I will jump on this the
>>>>>>>> following week before I get to some other WMS work.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ethan
>>>>>>>>
>>>>>>>> On 3/5/2010 4:44 AM, Heiko Klein wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> finally, I managed to generate a simple case how to reproduce
>> this
>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> a) start tomcat
>>>>>>>>> b) abort fetching some maps (3-4 times on a tomcat with
>>>>>>>>> 10thredds/HTTP1.1 connector), i.e. call
>>>>>>>>> GET
>>>>>>>>>
>>>>>>>>
>> 'http://localhost:8081/thredds/wms/data/met.no/hirlam12/wam_nsea.fc.2009
>>>>>>>>
>> 0604.nc?REQUEST=GetMap&LAYERS=significant_wave_height&PALETTE=redblue&SE
>>>>>>>>
>> RVICE=WMS&FORMAT=image/png&VERSION=1.3.0&CRS=EPSG:4326&BBOX=-30,-60,30,9
>>>>>>>> 0&WIDTH=400&HEIGHT=400&STYLES=BOXFILL/ncview'
>>>>>>>>>> /dev/null
>>>>>>>>> and abort this call before it is finished. (Ctrl-C)
>>>>>>>>> c) get some 'getCapabilities' documents:
>>>>>>>>> while [ 1 ]; do GET
>>>>>>>>>
>>>>>>>>
>> 'http://localhost:8081/thredds/wms/data/met.no/hirlam12/wam_nsea.fc.2009
>>>>>>>> 0604.nc?service=WMS&version=1.3.0&request=GetCapabilities'
>>>>>>>>>> testCap.xml; ls -l testCap.xml; done
>>>>>>>>> ...
>>>>>>>>> -rw-r--r-- 1 heikok heikok 172126 2010-03-05 12:11 testCap.xml
>>>>>>>>> -rw-r--r-- 1 heikok heikok 172126 2010-03-05 12:11 testCap.xml
>>>>>>>>> -rw-r--r-- 1 heikok heikok 172126 2010-03-05 12:11 testCap.xml
>>>>>>>>> -rw-r--r-- 1 heikok heikok 0 2010-03-05 12:11 testCap.xml
>>>>>>>>> -rw-r--r-- 1 heikok heikok 172126 2010-03-05 12:11 testCap.xml
>>>>>>>>> -rw-r--r-- 1 heikok heikok 172126 2010-03-05 12:11 testCap.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When the capabilites document with 0 byte size is fetched, the
>> earlier
>>>>>>>>> mentioned exception is thrown.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Since step b) is the crucial step to reproduce the 0 byte
>> document, I
>>>>>>>>> guess there is somewhere a badly caught
>>>>>>>> ClientAbortException/IOException
>>>>>>>>> when writing to the responses OutputStream, leaving a
>> tomcat-thread in
>>>>>>>>> an invalid state.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can solve this problem currently by adding a proxy in front
>> of
>>>>>>>>> thredds (apache mod_proxy) which seems to swallow all
>>>>>>>>> client-connection-aborts before they reach tomcat.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Heiko
>>>>>>>>>
>>>>>>>>> On 2010-02-19 16:54, Heiko Klein wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm still trying to track down this problem, so here a short
>> update:
>>>>>>>>>>
>>>>>>>>>> The ncWMS/TDS crashes seems to be connected to concurrency: The
>>>>>>>> problem
>>>>>>>>>> occurs when reloading the wms-client (openlayers in firefox),
>> that
>>>>>>>> means
>>>>>>>>>> GetCapabilites and GetMap are run in parallel. We're
>> downloading
>>>>>>>> tiles,
>>>>>>>>>> so reloading will fetch one capability doc + approx 12 maps.
>>>>>>>>>>
>>>>>>>>>> * If I configure tomcat to only use one thread, and use the
>> default
>>>>>>>>>> 'HTTP/1.1' connector, the problem disappears (but
>> response-times are
>>>>>>>>>> very slow).
>>>>>>>>>> * Using 4 threads + 'HTTP/1.1', the problem appears very fast.
>> (one
>>>>>>>> or
>>>>>>>>>> two reloads)
>>>>>>>>>> * Using the 'Nio' connector and 4 threads, the problem appears,
>> but
>>>>>>>> not
>>>>>>>>>> so fast (3-4 reloads)
>>>>>>>>>> * Using the 'Nio' connector with only one thread, is about the
>> same
>>>>>>>> as
>>>>>>>>>> Nio with several threads.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Heiko
>>>>>>>>>>
>>
> 



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: