[netcdf-java] newbie question on NetCDF file overhead (revisited)

To: netcdf-java@xxxxxxxxxxxxxxxx
Subject: [netcdf-java] newbie question on NetCDF file overhead (revisited)
From: Jeff Johnson - NOAA Affiliate <jeff.m.johnson@xxxxxxxx>
Date: Thu, 1 May 2014 15:31:47 -0600

Hi again-

I got some feedback that the size issue may either be related to the use of
String variables or the layout of the schema. So, I tried simplifying even
further to take both of those factors out of the equation. The test program
below has only one variable, the time dimension as a LONG.

Here is the truncated ncdump of the output file:

netcdf output2 {
dimensions:
time = UNLIMITED ; // (10000 currently)
variables:
int64 time(time) ;
time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
data:

 time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
    1398978611136, 1398978611137, 1398978611138, 1398978611139,
    1398978611140, 1398978611141, 1398978611142, 1398978611143,
    1398978611144, 1398978611145, 1398978611146, 1398978611147,
    1398978611148, 1398978611149, 1398978611150, 1398978611151,
    1398978611152, 1398978611153, 1398978611154, 1398978611155,
...
    <thousands of lines removed>
...
    1398978621104, 1398978621105, 1398978621106, 1398978621107,
    1398978621108, 1398978621109, 1398978621110, 1398978621111,
    1398978621112, 1398978621113, 1398978621114, 1398978621115,
    1398978621116, 1398978621117, 1398978621118, 1398978621119,
    1398978621120, 1398978621121, 1398978621122, 1398978621123,
    1398978621124, 1398978621125, 1398978621126, 1398978621127,
    1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
}

The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the
NetCDF-4 file created is 537872 bytes.  This is 6.7x larger, or 85%
overhead. :(  Hoping that the NetCDF format overhead is just stands out
with small datasets, I did an additional run of 1M records. The output file
was 53.4MB, also 6.7x larger.

I'm at a loss as to what the issue might be, unless this is just a fact of
life for NetCDF files? Any suggestions or insights appreciated!

jeff

===
import org.joda.time.DateTime;
import ucar.ma2.ArrayLong;
import ucar.ma2.DataType;
import ucar.ma2.InvalidRangeException;
import ucar.nc2.*;

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;

public class TestGenFile2 {
  public static void main(String[] args) {
    NetcdfFileWriter dataFile = null;

    try {
      try {

        // define the file
        String filePathName = "output2.nc";

        // delete the file if it already exists
        Path path = FileSystems.getDefault().getPath(filePathName);
        Files.deleteIfExists(path);

        // enter definition mode for this NetCDF-4 file
        dataFile =
NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);

        // create the root group
        Group rootGroup = dataFile.addGroup(null, null);

        // define dimensions, in this case only one: time
        Dimension timeDim = dataFile.addUnlimitedDimension("time");
        List<Dimension> dimList = new ArrayList<>();
        dimList.add(timeDim);

        // define variables
        Variable time = dataFile.addVariable(rootGroup, "time",
DataType.LONG, dimList);
        dataFile.addVariableAttribute(time, new Attribute("units",
"milliseconds since 1970-01-01T00:00:00Z"));

        // create the file
        dataFile.create();

        // create 1-D arrays to hold data values (time is the dimension)
        ArrayLong timeArray = new ArrayLong.D1(1);

        int[] origin = new int[]{0};
        long startTime = 1398978611132L;

        // write the records to the file
        for (int i = 0; i < 10000; i++) {
          // load data into array variables
          timeArray.set(timeArray.getIndex(), startTime++);

          origin[0] = i;

          // write a record
          dataFile.write(time, origin, timeArray);
        }
      } finally {
        if (null != dataFile) {
          // close the file
          dataFile.close();
        }
      }
    } catch (IOException | InvalidRangeException e) {
      e.printStackTrace();
    }
  }
}

-- 
Jeff Johnson
DSCOVR Ground System Development
Space Weather Prediction Center
jeff.m.johnson@xxxxxxxx

Follow-Ups:
- Re: [netcdf-java] newbie question on NetCDF file overhead (revisited)
  - From: John Caron

2014 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: