Hi again-
I got some feedback that the size issue may either be related to the use of
String variables or the layout of the schema. So, I tried simplifying even
further to take both of those factors out of the equation. The test program
below has only one variable, the time dimension as a LONG.
Here is the truncated ncdump of the output file:
netcdf output2 {
dimensions:
time = UNLIMITED ; // (10000 currently)
variables:
int64 time(time) ;
time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
data:
time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
1398978611136, 1398978611137, 1398978611138, 1398978611139,
1398978611140, 1398978611141, 1398978611142, 1398978611143,
1398978611144, 1398978611145, 1398978611146, 1398978611147,
1398978611148, 1398978611149, 1398978611150, 1398978611151,
1398978611152, 1398978611153, 1398978611154, 1398978611155,
...
<thousands of lines removed>
...
1398978621104, 1398978621105, 1398978621106, 1398978621107,
1398978621108, 1398978621109, 1398978621110, 1398978621111,
1398978621112, 1398978621113, 1398978621114, 1398978621115,
1398978621116, 1398978621117, 1398978621118, 1398978621119,
1398978621120, 1398978621121, 1398978621122, 1398978621123,
1398978621124, 1398978621125, 1398978621126, 1398978621127,
1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
}
The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the
NetCDF-4 file created is 537872 bytes. This is 6.7x larger, or 85%
overhead. :( Hoping that the NetCDF format overhead is just stands out
with small datasets, I did an additional run of 1M records. The output file
was 53.4MB, also 6.7x larger.
I'm at a loss as to what the issue might be, unless this is just a fact of
life for NetCDF files? Any suggestions or insights appreciated!
jeff
===
import org.joda.time.DateTime;
import ucar.ma2.ArrayLong;
import ucar.ma2.DataType;
import ucar.ma2.InvalidRangeException;
import ucar.nc2.*;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
public class TestGenFile2 {
public static void main(String[] args) {
NetcdfFileWriter dataFile = null;
try {
try {
// define the file
String filePathName = "output2.nc";
// delete the file if it already exists
Path path = FileSystems.getDefault().getPath(filePathName);
Files.deleteIfExists(path);
// enter definition mode for this NetCDF-4 file
dataFile =
NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);
// create the root group
Group rootGroup = dataFile.addGroup(null, null);
// define dimensions, in this case only one: time
Dimension timeDim = dataFile.addUnlimitedDimension("time");
List<Dimension> dimList = new ArrayList<>();
dimList.add(timeDim);
// define variables
Variable time = dataFile.addVariable(rootGroup, "time",
DataType.LONG, dimList);
dataFile.addVariableAttribute(time, new Attribute("units",
"milliseconds since 1970-01-01T00:00:00Z"));
// create the file
dataFile.create();
// create 1-D arrays to hold data values (time is the dimension)
ArrayLong timeArray = new ArrayLong.D1(1);
int[] origin = new int[]{0};
long startTime = 1398978611132L;
// write the records to the file
for (int i = 0; i < 10000; i++) {
// load data into array variables
timeArray.set(timeArray.getIndex(), startTime++);
origin[0] = i;
// write a record
dataFile.write(time, origin, timeArray);
}
} finally {
if (null != dataFile) {
// close the file
dataFile.close();
}
}
} catch (IOException | InvalidRangeException e) {
e.printStackTrace();
}
}
}
--
Jeff Johnson
DSCOVR Ground System Development
Space Weather Prediction Center
jeff.m.johnson@xxxxxxxx