Re: [netcdf-java] newbie question on NetCDF file overhead (revisited)

Hi Jeff:

The latest development version of netcdf-java (4.5.0) does chunking and 
compression by default, using a more reasonable algorithm. So if you try your 
code below with that version, you should see much better results.

details are here:

http://www.unidata.ucar.edu/software/thredds/v4.5/netcdf-java/reference/netcdf4Clibrary.html#writing

John

On 5/1/2014 3:31 PM, Jeff Johnson - NOAA Affiliate wrote:
> Hi again-
> 
> I got some feedback that the size issue may either be related to the use 
> of String variables or the layout of the schema. So, I tried simplifying 
> even further to take both of those factors out of the equation. The test 
> program below has only one variable, the time dimension as a LONG.
> 
> Here is the truncated ncdump of the output file:
> 
> netcdf output2 {
> dimensions:
> time = UNLIMITED ; // (10000 currently)
> variables:
> int64 time(time) ;
> time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
> data:
> 
>   time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
>      1398978611136, 1398978611137, 1398978611138, 1398978611139,
>      1398978611140, 1398978611141, 1398978611142, 1398978611143,
>      1398978611144, 1398978611145, 1398978611146, 1398978611147,
>      1398978611148, 1398978611149, 1398978611150, 1398978611151,
>      1398978611152, 1398978611153, 1398978611154, 1398978611155,
> ...
>      <thousands of lines removed>
> ...
>      1398978621104, 1398978621105, 1398978621106, 1398978621107,
>      1398978621108, 1398978621109, 1398978621110, 1398978621111,
>      1398978621112, 1398978621113, 1398978621114, 1398978621115,
>      1398978621116, 1398978621117, 1398978621118, 1398978621119,
>      1398978621120, 1398978621121, 1398978621122, 1398978621123,
>      1398978621124, 1398978621125, 1398978621126, 1398978621127,
>      1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
> }
> 
> The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the 
> NetCDF-4 file created is 537872 bytes.  This is 6.7x larger, or 85% 
> overhead. :(  Hoping that the NetCDF format overhead is just stands out 
> with small datasets, I did an additional run of 1M records. The output 
> file was 53.4MB, also 6.7x larger.
> 
> I'm at a loss as to what the issue might be, unless this is just a fact 
> of life for NetCDF files? Any suggestions or insights appreciated!
> 
> jeff
> 
> ===
> import org.joda.time.DateTime;
> import ucar.ma2.ArrayLong;
> import ucar.ma2.DataType;
> import ucar.ma2.InvalidRangeException;
> import ucar.nc2.*;
> 
> import java.io.IOException;
> import java.nio.file.FileSystems;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.util.ArrayList;
> import java.util.List;
> 
> public class TestGenFile2 {
>    public static void main(String[] args) {
>      NetcdfFileWriter dataFile = null;
> 
>      try {
>        try {
> 
>          // define the file
>          String filePathName = "output2.nc <http://output2.nc>";
> 
>          // delete the file if it already exists
>          Path path = FileSystems.getDefault().getPath(filePathName);
>          Files.deleteIfExists(path);
> 
>          // enter definition mode for this NetCDF-4 file
>          dataFile = 
> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);
> 
>          // create the root group
>          Group rootGroup = dataFile.addGroup(null, null);
> 
>          // define dimensions, in this case only one: time
>          Dimension timeDim = dataFile.addUnlimitedDimension("time");
>          List<Dimension> dimList = new ArrayList<>();
>          dimList.add(timeDim);
> 
>          // define variables
>          Variable time = dataFile.addVariable(rootGroup, "time", 
> DataType.LONG, dimList);
>          dataFile.addVariableAttribute(time, new Attribute("units", 
> "milliseconds since 1970-01-01T00:00:00Z"));
> 
>          // create the file
>          dataFile.create();
> 
>          // create 1-D arrays to hold data values (time is the dimension)
>          ArrayLong timeArray = new ArrayLong.D1(1);
> 
>          int[] origin = new int[]{0};
>          long startTime = 1398978611132L;
> 
>          // write the records to the file
>          for (int i = 0; i < 10000; i++) {
>            // load data into array variables
>            timeArray.set(timeArray.getIndex(), startTime++);
> 
>            origin[0] = i;
> 
>            // write a record
>            dataFile.write(time, origin, timeArray);
>          }
>        } finally {
>          if (null != dataFile) {
>            // close the file
>            dataFile.close();
>          }
>        }
>      } catch (IOException | InvalidRangeException e) {
>        e.printStackTrace();
>      }
>    }
> }
> 
> -- 
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
> 
> 
> 
> _______________________________________________
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit: 
> http://www.unidata.ucar.edu/mailing_lists/
> 



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: