Hi Ben,
I've now implemented this in the ncWMS trunk. (Actually I took the same idea
as your implementation but did it in another way, which hopefully saves some
memory.) You might be able to build a new ncWMS.jar file or wait for the next
codebase sync.
I haven't implemented Xiangtan's fix to force the SCANLINE data-reading
strategy as sometimes this isn't the right thing to do (although it certainly
is when datasets get very large). I think this needs to be made configurable,
or perhaps the switch to SCANLINE could happen when grids get above a certain
size.
I also haven't gotten around to implementing your unit test, but thanks very
much indeed for this, I'll try to do that soon.
Many thanks for your contribution! And you definitely hold the current record
for "largest file served through ncWMS" (at least to my knowledge). I'm
pleased the performance holds up OK.
Cheers, Jon
-----Original Message-----
From: Ben Caradoc-Davies [mailto:Ben.Caradoc-Davies@xxxxxxxx]
Sent: 05 September 2011 10:13
To: Jon Blower
Cc: thredds mailing list
Subject: Thredds WMS support for large source grids
Jon,
I tested WMS in thredds 4.2.6 with large NetCDF source grids and encountered an
integer overflow in ncwms PixelMap. (You foretold this in the comments!) The
attached patch fixes this defect at the cost of a small increase in memory use.
You might remember writing (in PixelMap):
// Calculate a single integer representing this grid point in the source grid
// TODO: watch out for overflows (would only happen with a very large grid!)
int sourceGridIndex = j * this.sourceGridISize + i;
The integer overflow appears when the source grid has more than 2**31-1 points.
For example, this limit is exceeded with a 26 GB NetCDF file with a single
ubyte variable on a 92255x301081 grid.
The attached patch includes Xiangtan Lin's CdmUtils fix to force
DataReadingStrategy.SCANLINE for HDF5:
http://mailman.unidata.ucar.edu/mailing_lists/archives/thredds/2011/msg00312.html
The PixelMap change replaces the single integer array representing source and
target grid offsets integers packed into a single long with two long arrays,
one for source and one for target. This costs extra memory but may, in addition
to supporting large grids, improve performance by avoiding packing an unpacking.
It also includes:
- a minor CdmUtils static initialiser change to appease ecj (the Eclipse
compiler)
- access changes in HorizontalCoordSys to support unit testing
- a fix for axis sizes needed when LatLonCoordSys is explicitly instantiated in
the unit test (otherwise they can never be set)
- a unit test in which only the small() test method passes before the patch is
applied (to ensure existing behaviour is preserved for small grids); all test
methods ensure the expected source grid offset monotonicity
The patch is against the ncwms-src.jar distributed with thredds 4.2.6 (I'm
guessing the ncwms tds4.2-20101102 branch).
With this patch applied and the replacement ncwms.jar installed in WEB-INF/lib,
thredds 4.2.6 can serve a test 647 GB NetCDF4/HDF5 file via
WMS:
http://siss2.anu.edu.au/thredds/godiva2/godiva2.html?server=http://siss2.anu.edu.au/thredds/wms/ga/test/PRISM_UTM55_wgs84.nc
The test file has a single ubyte variable on a 461276x1505407 grid.
Performance is better than I expected; the aligned source and target grids plus
the nearest-point mapping from target to source seem to do the trick.
Kind regards,
--
Ben Caradoc-Davies <Ben.Caradoc-Davies@xxxxxxxx> Software Engineering Team
Leader CSIRO Earth Science and Resource Engineering Australian Resources
Research Centre