Hi Robin,
> the processing I’m doing is writing to several different
> NetCDF files, multiple variables a row at a time.
So if I understand you, your write pattern looks something like:
Write row 0 for varA
Write row 0 for varB
Write row 0 for varC
Write row 1 for varA
Write row 1 for varB
Write row 1 for varC
etc...
Is that correct? If so, you are writing the contents of a file
*non-sequentially*, because a variable's data is laid out contiguously in
netcdf (unless it's chunked). Non-sequential (aka "random") I/O is always
going to be slower than sequential I/O, at least if you're writing to
spinning disks.
Do sequential I/O if you can. If that's not possible, the C library offers
a way to map a dataset into memory [1]. That'll make those random writes
much, much faster. Unfortunately, we don't currently provide a way to
access that feature from NetCDF-Java.
Cheers,
Christian
[1]
http://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga427f5a0b24f1d426a99bcc37b8a39cac
(look for "NC_DISKLESS")
On Wed, Jun 29, 2016 at 1:45 AM, Robin Moss <robin.moss@xxxxxxxxxxxxxx>
wrote:
> Sorry, let me add some additional information.
>
>
>
> I have been given a product specification, with several different files
> but the overall gist of the files is (The data type varies from byte to
> long):
>
> <dimension name="columns" length="512"/>
>
> <dimension name="rows" length="45000"/>
>
> <dimension name="orphan_pixels" length="1" isUnlimited="true"/>
>
>
>
> <variable name="var1" shape="rows columns" type="short">
>
> <variable name=”var1_orphan" shape="orphan_pixels" type="short">
>
> <variable name="var2" shape="rows columns" type="short">
>
> <variable name=”var2_orphan" shape="orphan_pixels" type="short">
>
>
>
> For instance at most we are writing to 18 files sequentially (currently
> only 13). Note I did try to use the NetCDF-Java library with multiple
> threads but it causes a seg file (
> https://github.com/Unidata/thredds/issues/577).At its fastest we’ve been
> seeing data getting to the writers every couple of milliseconds, we then
> convert the data arrays (stored in lists) into NetCDF-Java Arrays and then
> go on to write them:
>
>
>
> public <T> void writeData(String internalName, List<T> data, int[]
> shape, int[] origin, Class<T> type) {
>
> // move data into netcdf data shape
>
> Array rawData = Array.factory(type, shape);
>
>
>
> for (int i = 0; i < data.size(); i++) {
>
> rawData.setObject(i, data.get(i));
>
> }
>
> this.writeData(internalName, rawData, origin);
>
> }
>
>
>
> public void writeVariable(String name, int[] origin, Array values)
> throws IOException, InvalidRangeException {
>
> LOG.trace("Wrting Variable {} to netcdf file", name);
>
>
>
> Variable var = netcdfFileWriter.findVariable(name);
>
> Objects.requireNonNull(var, String.format("Variable with name: %s
> cannot be found", name));
>
> this.netcdfFileWriter.write(var, origin, values);
>
> }
>
>
>
> What we then end up seeing if sampled by VisualVM (working on another
> profiler so I can get average call time) that
> `ucar.nc2.jni.netcdf.Nc4Iosp.writeData()` is using a lot of time to run.
>
>
>
> Hope this helps clarify my situation
>
>
>
> *From:* Bob Simons - NOAA Federal [mailto:bob.simons@xxxxxxxx]
> *Sent:* 28 June 2016 16:32
> *To:* Robin Moss
> *Cc:* netcdf-java@xxxxxxxxxxxxxxxx
> *Subject:* Re: [netcdf-java] Performance Issues and Buffering
>
>
>
> You don't say *how* you are writing the data, other than "a row at a
> time".
>
>
>
> Is the row dimension an unlimited dimension? (That is what I would
> recommend trying.)
>
>
>
> Or have you pre-allocated space in the variables and are now writing data
> into that space?
>
>
>
> Or are you reading the entire file, adding one row of data, then writing
> the entire file? (That is bound to be slow when the number of rows gets
> larger.)
>
>
>
>
>
>
>
>
>
> On Tue, Jun 28, 2016 at 1:26 AM, Robin Moss <robin.moss@xxxxxxxxxxxxxx>
> wrote:
>
> Hello,
>
>
>
> I’m hoping I can get some pointers to improve the way im using the NetCDF
> library.
>
>
>
> At the moment the processing I’m doing is writing to several different
> NetCDF files, multiple variables a row at a time. These are not currently
> multi-threaded.
>
>
>
> When the processed data is small I don’t see any issues (100’s of rows),
> however when I start running a bigger chain (10’s of thousands of rows) I
> see the performance of NetCDF Java plummet, a quick look at whats happening
> with VisualVM shows that most of my application times (~60%) is spent in
> `Nc4Iosp.writeData()`.
>
>
>
> Which leads me to believe I’m using the library wrong J, my initial
> thought having worked with the C Library directly before was to adjust the
> write buffer, but I don’t see any support for that in the Java lib and
> considering it would likely affect the C Lib I’m not sure it would help
> with the write data call.
>
>
>
> I had briefly looked into just buffering my rows so I write every 10-100
> rows to see what effect that would have on performance and memory usage,
> however I hit a bit of an issue with the variables that have an unlimited
> dimension of columns (most variables I have are row x column), in that I
> was unable to figure out how to create an Array that supported unlimited
> dimensions.
>
>
>
> We currently use the NetcdfFileWriter to writer data to the underlying
> NetCDF 4 files, I know the API suggests using the FileWriter2, but I
> couldn’t see a way to use that, that also allowed us to ‘stream’ data into
> the underlying files.
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks,
>
> Robin
>
>
>
>
>
> WARNING: This message contains confidential and/or proprietary information
> which may be subject to privilege or immunity and which is intended for the
> use of its addressee only. Should you receive this message in error, you
> are kindly requested to inform the sender and to definitively remove it
> from any paper or electronic format. Any other use of this e-mail is
> strictly forbidden. Thank you in advance for your cooperation.
>
> Please consider the environment before printing this email.
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
>
> --
>
> Sincerely,
>
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A (New!)
> Monterey, CA 93940 (New!)
> Phone: (831)333-9878 (New!)
>
> Fax: (831)648-8440
> Email: bob.simons@xxxxxxxx
>
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>