Hi Bill,
Well, how many files do you have? On my Linux CentOS 5.2 box I can have
> 300,000 files open at once (cat /proc/sys/fs/file-max).
BTW, I pulled the open/close calls out of the inner loop in your code
and found that both the C and Java versions ran about four time as fast.
- Joe
Bill Moninger wrote:
Hi Joe,
good suggestions all, but they're probably not so useful in in my
case. We get enough requests per day that they load down our web
server if the app that generates the soundings takes too much cpu
time--and a doubling of the cpu time for each access does indeed bring
web response to a crawl.
Keeping the file(s) open isn't a desirable option either, because
users are accessing many different forecasts at many different hours,
and each is stored in a different file. While there are certainly
caching options we could use, the bookkeeping probably wouldn't be
worth the effort--particularly since we have a working system (with C)
now.
But I hope you and others will continue providing ideas. Our sounding
generation software has to read netCDF, grib, and grib2, and at the
moment we have different code for each. netcdf-java would allow
unified code that would be much easier to maintain.
-Bill
On 5/7/2009 1:04 PM, Joe Sirott wrote:
Hi Bill,
One question I have: does it really matter if your Java Web
application is 2 or 3x slower than your C application? You mentioned
that your current application takes significantly less than one
second to produce a plot; even if your Java Web app takes, say, one
second to produce a plot that still would allow for ~100,000 plot
requests per day. And you could easily increase this capability by
implementing a caching scheme.
If you do need to speed up your application, I found when I profiled
the Java netCDF library a couple of years ago athat it can be more
expensive to open a netCDF file than to read small amounts of data
(like 2D slices from a 4D variable) from the file. So one strategy
(at least in a Web application environment) is to keep the file open
so repeated reads of the file don't incur the overhead of reopening
the file. There are some issues with this -- the library isn't thread
safe, so you don't want to share the file object across threads, and
you might run into a problem with too many open files if you have a
lot of files, but there are strategies to work around this.
- Joe
Bill Moninger wrote:
Hello John, Jon, and Bob,
Thanks for your useful questions and comments.
I was testing the timing from the command line, and I agree that
java startup time might have been a big issue.
So I took the lead from the modified program that John sent back,
which did a loop of opening and the netcdf file, and pulling a
hyperslab (a hyperline really) out of the file, then closing it.
I amended both the java and C programs (attached as a tar file) to
take the number of times through the loop as the sole argument and
got the following results when reading the netcdf file available at
http://ruc.noaa.gov/ruc_native_40.nc (53 M in size):
%> sounding.x 10000
C: elapsed time for 10000 reads is 16.630000 seconds
(varied between 13.9 and 16.6 secs)
%> java -server Tester2 10000
java: elapsed time for 10000 reads is 44.466998 seconds
(varied between 20.1 sec and 44.5 sec)
So, it looks like something other than the startup cost is causing
java to be slower than C by about 1.2 to 2.5x. But the java times
appear to be a lot more variable than the C times.
Perhaps I am using the libraries non-optimal; if so, I'll be very
grateful for any suggestions
-Bill
On 5/6/2009 4:21 PM, John Caron wrote:
Hi Bill:
I made a few mods to your program (attached)
1) removed the print statements, which are notoriously slow.
2) did the whole open/read/close loop 100 times
3) added timing, and got:
that took 1248.659775 millisecs
which is about 13 msecs per call. When I get a chance I will try to
compare to the C code.
None of this is all that definitive, its very hard to get accurate
timings on small programs. For one thing, the java compiler happens
at runtime, and its somewhat indeterministic. so running a program
once will very likely look very bad. If you are doing a CGI type
server, where the java application starts up for each request, that
will be very slow.
I can pretty much promise you that java performance is within a
factor of 2 of C code, and more likely within 20% of C code, in a
long-running server environment. There are certain things it can do
faster, like memory allocation and multithreading.
Anyway, I could look at your actual production code to see if there
are some ways to help speed it up. It is possible that for various
reasons, Java will be "several times slower" than C code, so you'll
have to decide if the increase in productivity is worth it.
Bill Moninger wrote:
_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx <mailto:netcdf-java@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/