Hi Jeff,
Looks like you are writing netCDF-4 files. Given that, I believe the
problem is the default chunking that is used when an unlimited dimension
is involved. From an old email conversation on the netcdfgroup email
list [1] which references a netCDF-C chunking document [2], it sounds
like each record along the unlimited dimension is a single chunk. Since
each of your records is so small, that's a lot of overhead and likely
the culprit in the larger than expected files you are seeing.
In your code below, try the version of NetcdfFileWriter.createWriter()
that has a Nc4Chunking parameter. Looks like the easiest approach is to
use the "standard" strategy
> Nc4ChunkingStrategyImpl.factory(Nc4Chunking.Strategy.standard, 0, false)
and add "_ChunkSize" attributes to your variables with a single integer
value since you only have one dimension. I'm not a chunking expert but
maybe start with a value of 2000. I've included a few links to some blog
posts on chunking and compression ([3], [4], and [5]) which discuss
choosing chunk sizes.
Hope that helps.
Ethan
[1]
https://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2012/msg00005.html
[2]
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005fdef_005fvar_005fchunking.html#nc_005fdef_005fvar_005fchunking
[3]
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters
[4]
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes
[5] http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression
On 4/30/2014 5:09 PM, Jeff Johnson - NOAA Affiliate wrote:
> Sorry, correction - raw data = 300000 bytes, so NetCDF is 8x larger.
>
> jeff
>
>
> On Wed, Apr 30, 2014 at 4:37 PM, Jeff Johnson - NOAA Affiliate
> <jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>> wrote:
>
> Hi all-
>
> I'm working on generating my first NetCDF files and have a question.
> The files I'm creating seem to be far larger than I would have
> thought necessary to hold the given data. I'm wondering if there is
> something I can do to trim this down a bit.
>
> Our data is simple time-series data (one unlimited dimension). Below
> is a simple Java test program that generates a file with 10000
> records, each of which contains a 24-character timestamp string and
> three 2-byte values. This gives a raw data requirement of 30000
> bytes. The generated NetCDF file is 2420656 bytes, or 80x larger. Is
> this what is expected? In my development with real data I'm seeing
> 7MB of data creating an 86MB NetCDF file, etc. It seems to settle
> out at about 12x as the data sets grow, which is still pretty
> onerous. Any insights or suggestions appreciated.
>
> package gov.noaa.swpc.solarwind;
>
> import org.joda.time.DateTime;
> import ucar.ma2.ArrayShort;
> import ucar.ma2.ArrayString;
> import ucar.ma2.DataType;
> import ucar.ma2.InvalidRangeException;
> import ucar.nc2.*;
>
> import java.io.IOException;
> import java.nio.file.FileSystems;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.util.ArrayList;
> import java.util.List;
>
> public class TestGenFile {
> public static void main(String[] args) {
> DateTime startDate = new DateTime();
> DateTime endDate = startDate.plusDays(1);
>
> NetcdfFileWriter dataFile = null;
>
> try {
> try {
>
> // define the file
> String filePathName = "output.nc <http://output.nc>";
>
> // delete the file if it already exists
> Path path = FileSystems.getDefault().getPath(filePathName);
> Files.deleteIfExists(path);
>
> // enter definition mode for this NetCDF-4 file
> dataFile =
> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4,
> filePathName);
>
> // create the root group
> Group rootGroup = dataFile.addGroup(null, null);
>
> // define the global attributes
> dataFile.addGroupAttribute(rootGroup, new
> Attribute("startDate", startDate.toString()));
> dataFile.addGroupAttribute(rootGroup, new
> Attribute("endDate", endDate.toString()));
>
> // define dimensions, in this case only one: time
> Dimension timeDim = dataFile.addUnlimitedDimension("time");
> List<Dimension> dimList = new ArrayList<>();
> dimList.add(timeDim);
>
> // define variables
> Variable time = dataFile.addVariable(rootGroup, "time",
> DataType.STRING, dimList);
> dataFile.addVariableAttribute(time, new
> Attribute("standard_name", "time"));
>
> Variable bx = dataFile.addVariable(rootGroup, "bx",
> DataType.SHORT, dimList);
> dataFile.addVariableAttribute(bx, new Attribute("long_name",
> "IMF Bx"));
> dataFile.addVariableAttribute(bx, new Attribute("units",
> "raw counts"));
>
> Variable by = dataFile.addVariable(rootGroup, "by",
> DataType.SHORT, dimList);
> dataFile.addVariableAttribute(by, new Attribute("long_name",
> "IMF By"));
> dataFile.addVariableAttribute(by, new Attribute("units",
> "raw counts"));
>
> Variable bz = dataFile.addVariable(rootGroup, "bz",
> DataType.SHORT, dimList);
> dataFile.addVariableAttribute(bz, new Attribute("long_name",
> "IMF Bz"));
> dataFile.addVariableAttribute(bz, new Attribute("units",
> "raw counts"));
>
> // create the file
> dataFile.create();
>
> // create 1-D arrays to hold data values (time is the dimension)
> ArrayString timeArray = new ArrayString.D1(1);
> ArrayShort.D1 bxArray = new ArrayShort.D1(1);
> ArrayShort.D1 byArray = new ArrayShort.D1(1);
> ArrayShort.D1 bzArray = new ArrayShort.D1(1);
>
> int[] origin = new int[]{0};
>
> // write the records to the file
> for (int i = 0; i < 10000; i++) {
> // load data into array variables
> timeArray.setObject(timeArray.getIndex(), new
> DateTime().toString());
> bxArray.set(0, (short) i);
> byArray.set(0, (short) (i * 2));
> bzArray.set(0, (short) (i * 3));
>
> origin[0] = i;
>
> // write a record
> dataFile.write(time, origin, timeArray);
> dataFile.write(bx, origin, bxArray);
> dataFile.write(by, origin, byArray);
> dataFile.write(bz, origin, bzArray);
> }
> } finally {
> if (null != dataFile) {
> // close the file
> dataFile.close();
> }
> }
> } catch (IOException | InvalidRangeException e) {
> e.printStackTrace();
> }
> }
> }
>
> thanks,
> jeff
>
> --
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
>
>
>
>
> --
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
> 303-497-6260
>
>
> _______________________________________________
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
--
Ethan Davis UCAR Unidata Program
edavis@xxxxxxxxxxxxxxxx http://www.unidata.ucar.edu