Re: [netcdf-java] Performance Issues and Buffering

Sorry, let me add some additional information.

I have been given a product specification, with several different files but the 
overall gist of the files is (The data type varies from byte to long):
  <dimension name="columns" length="512"/>
  <dimension name="rows" length="45000"/>
  <dimension name="orphan_pixels" length="1" isUnlimited="true"/>

  <variable name="var1" shape="rows columns" type="short">
 <variable name=”var1_orphan" shape="orphan_pixels" type="short">
  <variable name="var2" shape="rows columns" type="short">
 <variable name=”var2_orphan" shape="orphan_pixels" type="short">

For instance at most we are writing to 18 files sequentially (currently only 
13). Note I did try to use the NetCDF-Java library with multiple threads but it 
causes a seg file (https://github.com/Unidata/thredds/issues/577).At its 
fastest we’ve been seeing data getting to the writers every couple of 
milliseconds, we then convert the data arrays (stored in lists) into 
NetCDF-Java Arrays and then go on to write them:

  public <T> void writeData(String internalName, List<T> data, int[] shape, 
int[] origin, Class<T> type) {
        // move data into netcdf data shape
        Array rawData = Array.factory(type, shape);

        for (int i = 0; i < data.size(); i++) {
            rawData.setObject(i, data.get(i));
        }
        this.writeData(internalName, rawData, origin);
    }

   public void writeVariable(String name, int[] origin, Array values) throws 
IOException, InvalidRangeException {
        LOG.trace("Wrting Variable {} to netcdf file", name);

        Variable var = netcdfFileWriter.findVariable(name);
        Objects.requireNonNull(var, String.format("Variable with name: %s 
cannot be found", name));
        this.netcdfFileWriter.write(var, origin, values);
    }

What we then end up seeing if sampled by VisualVM (working on another profiler 
so I can get average call time) that `ucar.nc2.jni.netcdf.Nc4Iosp.writeData()` 
is using a lot of time to run.

Hope this helps clarify my situation

From: Bob Simons - NOAA Federal [mailto:bob.simons@xxxxxxxx]
Sent: 28 June 2016 16:32
To: Robin Moss
Cc: netcdf-java@xxxxxxxxxxxxxxxx
Subject: Re: [netcdf-java] Performance Issues and Buffering

You don't say how you are writing the data, other than "a row at a time".

Is the row dimension an unlimited dimension? (That is what I would recommend 
trying.)

Or have you pre-allocated space in the variables and are now writing data into 
that space?

Or are you reading the entire file, adding one row of data, then writing the 
entire file? (That is bound to be slow when the number of rows gets larger.)




On Tue, Jun 28, 2016 at 1:26 AM, Robin Moss 
<robin.moss@xxxxxxxxxxxxxx<mailto:robin.moss@xxxxxxxxxxxxxx>> wrote:
Hello,

I’m hoping I can get some pointers to improve the way im using the NetCDF 
library.

At the moment the processing I’m doing is writing to several different NetCDF 
files, multiple variables a row at a time. These are not currently 
multi-threaded.

When the processed data is small I don’t see any issues (100’s of rows), 
however when I start running a bigger chain (10’s of thousands of rows) I see 
the performance of NetCDF Java plummet, a quick look at whats happening with 
VisualVM shows that most of my application times (~60%) is spent in 
`Nc4Iosp.writeData()`.

Which leads me to believe I’m using the library wrong ☺, my initial thought 
having worked with the C Library directly before was to adjust the write 
buffer, but I don’t see any support for that in the Java lib and considering it 
would likely affect the C Lib I’m not sure it would help with the write data 
call.

I had briefly looked into just buffering my rows so I write every 10-100 rows 
to see what effect that would have on performance and memory usage, however I 
hit a bit of an issue with the variables that have an unlimited dimension of 
columns (most variables I have are row x column), in that I was unable to 
figure out how to create an Array that supported unlimited dimensions.

We currently use the NetcdfFileWriter to writer data to the underlying NetCDF 4 
files, I know the API suggests using the FileWriter2, but I couldn’t see a way 
to use that, that also allowed us to ‘stream’ data into the underlying files.

Any suggestions would be greatly appreciated.

Thanks,
Robin



WARNING: This message contains confidential and/or proprietary information 
which may be subject to privilege or immunity and which is intended for the use 
of its addressee only. Should you receive this message in error, you are kindly 
requested to inform the sender and to definitively remove it from any paper or 
electronic format. Any other use of this e-mail is strictly forbidden. Thank 
you in advance for your cooperation.

Please consider the environment before printing this email.

_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit: 
http://www.unidata.ucar.edu/mailing_lists/



--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: bob.simons@xxxxxxxx<mailto:bob.simons@xxxxxxxx>

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: