Hi Jeff:
The latest development version of netcdf-java (4.5.0) does chunking and
compression by default, using a more reasonable algorithm. So if you try your
code below with that version, you should see much better results.
details are here:
http://www.unidata.ucar.edu/software/thredds/v4.5/netcdf-java/reference/netcdf4Clibrary.html#writing
John
On 5/1/2014 3:31 PM, Jeff Johnson - NOAA Affiliate wrote:
> Hi again-
>
> I got some feedback that the size issue may either be related to the use
> of String variables or the layout of the schema. So, I tried simplifying
> even further to take both of those factors out of the equation. The test
> program below has only one variable, the time dimension as a LONG.
>
> Here is the truncated ncdump of the output file:
>
> netcdf output2 {
> dimensions:
> time = UNLIMITED ; // (10000 currently)
> variables:
> int64 time(time) ;
> time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
> data:
>
> time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
> 1398978611136, 1398978611137, 1398978611138, 1398978611139,
> 1398978611140, 1398978611141, 1398978611142, 1398978611143,
> 1398978611144, 1398978611145, 1398978611146, 1398978611147,
> 1398978611148, 1398978611149, 1398978611150, 1398978611151,
> 1398978611152, 1398978611153, 1398978611154, 1398978611155,
> ...
> <thousands of lines removed>
> ...
> 1398978621104, 1398978621105, 1398978621106, 1398978621107,
> 1398978621108, 1398978621109, 1398978621110, 1398978621111,
> 1398978621112, 1398978621113, 1398978621114, 1398978621115,
> 1398978621116, 1398978621117, 1398978621118, 1398978621119,
> 1398978621120, 1398978621121, 1398978621122, 1398978621123,
> 1398978621124, 1398978621125, 1398978621126, 1398978621127,
> 1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
> }
>
> The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the
> NetCDF-4 file created is 537872 bytes. This is 6.7x larger, or 85%
> overhead. :( Hoping that the NetCDF format overhead is just stands out
> with small datasets, I did an additional run of 1M records. The output
> file was 53.4MB, also 6.7x larger.
>
> I'm at a loss as to what the issue might be, unless this is just a fact
> of life for NetCDF files? Any suggestions or insights appreciated!
>
> jeff
>
> ===
> import org.joda.time.DateTime;
> import ucar.ma2.ArrayLong;
> import ucar.ma2.DataType;
> import ucar.ma2.InvalidRangeException;
> import ucar.nc2.*;
>
> import java.io.IOException;
> import java.nio.file.FileSystems;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.util.ArrayList;
> import java.util.List;
>
> public class TestGenFile2 {
> public static void main(String[] args) {
> NetcdfFileWriter dataFile = null;
>
> try {
> try {
>
> // define the file
> String filePathName = "output2.nc <http://output2.nc>";
>
> // delete the file if it already exists
> Path path = FileSystems.getDefault().getPath(filePathName);
> Files.deleteIfExists(path);
>
> // enter definition mode for this NetCDF-4 file
> dataFile =
> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);
>
> // create the root group
> Group rootGroup = dataFile.addGroup(null, null);
>
> // define dimensions, in this case only one: time
> Dimension timeDim = dataFile.addUnlimitedDimension("time");
> List<Dimension> dimList = new ArrayList<>();
> dimList.add(timeDim);
>
> // define variables
> Variable time = dataFile.addVariable(rootGroup, "time",
> DataType.LONG, dimList);
> dataFile.addVariableAttribute(time, new Attribute("units",
> "milliseconds since 1970-01-01T00:00:00Z"));
>
> // create the file
> dataFile.create();
>
> // create 1-D arrays to hold data values (time is the dimension)
> ArrayLong timeArray = new ArrayLong.D1(1);
>
> int[] origin = new int[]{0};
> long startTime = 1398978611132L;
>
> // write the records to the file
> for (int i = 0; i < 10000; i++) {
> // load data into array variables
> timeArray.set(timeArray.getIndex(), startTime++);
>
> origin[0] = i;
>
> // write a record
> dataFile.write(time, origin, timeArray);
> }
> } finally {
> if (null != dataFile) {
> // close the file
> dataFile.close();
> }
> }
> } catch (IOException | InvalidRangeException e) {
> e.printStackTrace();
> }
> }
> }
>
> --
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
>
>
>
> _______________________________________________
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>