Hi Don,
Actually I think the former (a method to read raw data) is better than
the latter (not setting missing data metadata) because I still need a
method to do the unpacking etc on my data points of interest. This
needs the invalidDataMissing, fillValueMissing attributes to be set,
but I choose when to apply them, rather than them being applied on
every single data point that is read.
Regards, Jon
On 27/10/06, Don Murray <dmurray@xxxxxxxxxxxxxxxx> wrote:
Hi Jon-
Thanks for the explanation. It sounds like a method to read
the raw data would be useful or better yet a constructor
to GeoGrid that would take a boolean for not setting missing
data (akin to all the setInvalidDataMissing(), setFillValueMissing()
methods), but still allow the coordinate system enhancements.
Don
Jon Blower wrote:
> Hi Don,
>
> The problem is caused by my use of the nj22 library. In my
> application I need to create an image from a NetCDF file as quickly as
> possible. The image will often be of much lower resolution than the
> source data, but will not necessarily be in the same coordinate
> reference system.
>
> If I want to create a 100x100 image, I need to read at least 10,000
> data points. However, reading 10,000 individual points appears to be
> very slow (especially for an NcML aggregation) so I am compromising by
> reading chunks of contiguous data at a time. This means that I often
> end up reading considerably more data than I need to make the image.
> I perform the necessary interpolation in my application and throw away
> the unwanted data.
>
> If I read packed data using an "enhanced" variable, then every single
> point is internally checked to see if it is a missing value, and every
> single point is unpacked (scale and offset applied). Through
> profiling, I established this to be an expensive operation because it
> is being applied to many more data points than I need. Therefore I
> employed a method whereby data are read in their packed form, without
> being checked for missing values. I then perform the check just for
> the 10,000 points that I need to plot in my image. This is
> considerably and demonstrably faster, although as with all
> optimisation problems, it's a compromise.
>
> Does this clear things up? As far as changes to the libraries go, it
> would be handy to have a method in GeoGrid for reading "raw" (packed)
> data as fast as possible, and giving the user the opportunity to
> unpack the data later.
>
> Best wishes,
> Jon
>
> On 27/10/06, Don Murray <dmurray@xxxxxxxxxxxxxxxx> wrote:
>> Jon and John-
>>
>> Why is it so much slower using the GeoGrid directly? Perhaps
>> there can be some performance tuning on the GeoGrid side to
>> avoid people having to jump through the hoops that Jon is?
>> Is it because the GeoGrid scales and offsets the entire grid
>> before subsetting instead of subsetting and then scale and
>> offset (which seems to be what Jon ends up doing). Jon,
>> when you say you are scaling and offsetting only the individual
>> values, is this all the values in the subset or if not, what
>> percentage of the subset are you doing this on?
>>
>> We've been doing some profiling of the netcdf-java reading
>> in the IDV and if this is an area where we could get some
>> performance enhancements, I'd like to implement something
>> in the IDV.
>>
>> Don
>>
>> Jon Blower wrote:
>> > Hi John (cc list),
>> >
>> > Thanks for you help - I found a solution that works well in my app.
>> > As you suggested, I open the dataset without enhancement, then added
>> > the coordinate systems:
>> >
>> > nc = NetcdfDataset.openDataset(location, false, null);
>> > // Add the coordinate systems
>> > CoordSysBuilder.addCoordinateSystems(nc, null);
>> > GridDataset gd = new GridDataset(nc);
>> > GeoGrid geogrid = gd.findGridByName(varID);
>> >
>> > I then create an EnhanceScaleMissingImpl:
>> >
>> > EnhanceScaleMissingImpl enhanced = new
>> > EnhanceScaleMissingImpl((VariableDS)geogrid.getVariable());
>> >
>> > (Unfortunately this class is package-private so I made a copy from the
>> > source code in my local directory. Could this class be made public
>> > please?)
>> >
>> > This means that when I read data using geogrid.subset() it does not
>> > check for missing values or unpack the data and is therefore quicker.
>> > I then do enhanced.convertScaleOffsetMissing() only on the individual
>> > values I need to work with. Seems to work well and is pretty quick.
>> > Is there anything dangerous in the above?
>> >
>> > Thanks again,
>> > Jon
>> >
>> >
>> > On 26/10/06, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
>> >> Hi Jon:
>> >>
>> >> Jon Blower wrote:
>> >> > Hi John,
>> >> >
>> >> > I need some of the functionality of a GridDataset to allow me to
>> read
>> >> > coordinate system information. Also, I might be opening an NcML
>> >> > aggregation. Is it sensible to use
>> NetcdfDataset.getReferencedFile()?
>> >> > In the case of an NcML aggregation, is it possible to get a
>> handle to
>> >> > a specific NetcdfFile (given relevant information such as the
>> >> > timestep)?
>> >>
>> >> You are getting into the internals, so its a bit dangerous.
>> >>
>> >> I think this will work:
>> >>
>> >> NetcdfDataset ncd = openDataset(String location, false, null); //
>> >> dont enhance
>> >> ucar.nc2.dataset.CoordSysBuilder.addCoordinateSystems(ncd, null); //
>> >> add coord info
>> >> GridDataset gds = new GridDataset( ncd); // make into a grid
>> >>
>> >> BTW, you will want to switch to the new GridDataset in
>> >> ucar.nc2.dt.grid when you start using 2.2.17. It should be compatible,
>> >> let me know.
>> >>
>> >>
>> >> >
>> >> > On a related note, is it efficient to read data from a NetcdfFile
>> (or
>> >> > NetcdfDataset) point-by-point? I have been assuming that reading
>> >> > contiguous chunks of data is more efficient than reading individual
>> >> > points, even if it means reading more data than I actually need, but
>> >> > perhaps this is not the case? Unfortunately I'm not at my usual
>> >> > computer so I can't do a quick check myself. If reading data
>> >> > point-by-point is efficient (enough) my problem goes away.
>> >>
>> >> It depends on data locality. If the points are close together on disk,
>> >> then they will likely to be already in the random access file buffer.
>> >> The bigger the buffer the more likely, you can try different buffer
>> >> sizes with:
>> >>
>> >> NetcdfDataset openDataset(String location, boolean enhance, int
>> >> buffer_size, ucar.nc2.util.CancelTask cancelTask, Object spiObject);
>> >>
>> >>
>> >>
>> >> >
>> >> > Thanks, Jon
>> >> >
>> >> > On 26/10/06, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
>> >> >
>> >> >> Hi Jon:
>> >> >>
>> >> >> One obvious thing would be to open it as a NetcdfFile, not a
>> >> >> GridDataset. Is that a possibility?
>> >> >>
>> >> >> Jon Blower wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm writing an application that reads data from NetCDF files and
>> >> >> > produces images. I've noticed (through profiling) that a slow
>> point
>> >> >> > in the data reading process is the unpacking of packed data (i.e.
>> >> >> > applying scale and offset) and checking for missing values. I
>> would
>> >> >> > like to minimize the use of these calls.
>> >> >> >
>> >> >> > To cut a long post short, I would like to find a low-level
>> function
>> >> >> > that allows me to read the packed data, exactly as they appear in
>> >> the
>> >> >> > file. I can then "manually" apply the unpacking and
>> missing-value
>> >> >> > checks only to those data points that I need to display.
>> >> >> >
>> >> >> > I'm using nj22, version 2.2.16. I've tried reading data from
>> >> >> > GeoGrid.subset() but this (of course) performs the unpacking. I
>> >> then
>> >> >> > tried getting the "unenhanced" variable object through
>> >> >> > GeoGrid.getVariable().getOriginalVariable(), but
>> (unexpectedly) this
>> >> >> > also seems to perform unpacking and missing-value checks - I
>> >> expected
>> >> >> > it to give raw data.
>> >> >> >
>> >> >> > Can anyone help me to find a function for reading raw (packed)
>> data
>> >> >> > without performing missing-value checks?
>> >> >> >
>> >> >> > Thanks in advance,
>> >> >> > Jon
>> >> >> >
>> >> >>
>> >> >>
>> >>
>>
==============================================================================
>>
>> >>
>> >> >>
>> >> >> To unsubscribe netcdf-java, visit:
>> >> >> http://www.unidata.ucar.edu/mailing-list-delete-form.html
>> >> >>
>> >>
>>
==============================================================================
>>
>> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>> --
>> *************************************************************
>> Don Murray UCAR Unidata Program
>> dmurray@xxxxxxxxxxxxxxxx P.O. Box 3000
>> (303) 497-8628 Boulder, CO 80307
>> http://www.unidata.ucar.edu/staff/donm
>> *************************************************************
>>
>>
>>
>
>
--
*************************************************************
Don Murray UCAR Unidata Program
dmurray@xxxxxxxxxxxxxxxx P.O. Box 3000
(303) 497-8628 Boulder, CO 80307
http://www.unidata.ucar.edu/staff/donm
*************************************************************
--
--------------------------------------------------------------
Dr Jon Blower Tel: +44 118 378 5213 (direct line)
Technical Director Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre Fax: +44 118 378 6413
ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------
==============================================================================
To unsubscribe netcdf-java, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================