Tennessee Leeuwenburg wrote:
I turned the java heap size up to 1024m, and it was able to handle my
580Mb file.
Here's another question about the internals :
As some of you know, I have written a servlet which serves up NetCDF
files via HTTP, retrieved and converted from a database. This works
great for small data sets (<60Mb) but is behaving strangely for the
large (580Mb) dataset.
Serving the file through apache, it takes, oh, a few minutes to get
the DODS file onto my hard disk. Say 10 minutes, and that would be
plenty.
so if i understand, you have a client that requests the file via HTTP,
then just copies it to a file on disk ?
The servlet seems to take much longer. In terms of raw throughput when
downloading from HTTP via Firefox, I get about 1.8Mb/s from apache, vs
about 1.5 from my servlet. That's not a *huge* difference, and it's
probably related to window size or something.
When I connect THREDDS to apache, there is a latency while the file is
downloaded from apache, followed by throughput of about 1Mb/s and a
slight reduction in file size.
When I connect THREDDS to my servlet, the initial latency is at least
10 minutes (he says waiting for the download to start). I found this a
little weird, so I included some debugging in my servlet so I could
watch the contents of each packet. I'm serving the data in 8192byte
chunks, possible not the quickest way to go about it. What I see is a
generally increasing byte range being served, but occasionally, bytes
from earlier in the file are served. This seems a bit weird to me. I
guess thredds is "going back" and looking things up in order to
re-factor the data structure, but I want to make sure this is expected
behaviour and that nothing nuts is going on.
what do you mean "connect THREDDS to apache" or" my servlet" ? The
THREDDS data viewer?
generally a netcdf client like the thredds data viewer will treat the
file as random access, and so may skip around in the file. if all you do
is read the file sequentially, HTTP is ok. but for random access it can
be really slow. Opendap is much better in this case.
I am trying to work out how to redress the situation. One easy thing
to test is to vary the window size to a much larger number, say 500Kb
or even megabytes. I could possibly alter this on the basis of the
file size, or try to come up with some dynamic regime for altering the
window size.
depends on your data access pattern.
Is there a "magic number" in thredds which is a best window size to
use? Would it "prefer" to get its data in any particular way? Thredds
is basically the only client for this servlet, so I will just tune it
for best performance.
what do you mean by "window size" ?
Or maybe it's just some inefficiency in java's random-access - if it's
a separate request every time, maybe there's even a new instance
handling each request and I'm getting bogged down in object creation.
Now there's a thought! If that's the case, I'll have to implement some
kind of static object containing the currently open files to avoid
re-opening them...
Feedback welcome. Sorry to abuse the list for hair-brained developer
questions. Maybe one day I'll be able to do something useful for you...
Download still waiting...
Cheers,
-Tennessee