« Converting GRIB to... | Main | Part 3: Converting... »

12 August 2014

Last time I described the experiment of writing NCEP model GRIB files to netCDF-4. Here are the raw results of that experiment, using deflate level 3 and no shuffle for netCDF-4 compression:

There are a total of 51 NCEP model runs in this plot, each is one complete forecast run. Lets split the files out by GRIB-1 and GRIB-2:

As you can see, GRIB-2 has significantly better compression than GRIB-1, probably due to the JPEG-2000 wavelet compression. In case you are wondering about the file where netCDF-4 is smaller than GRIB 2 (the point between .5 and .75 ratio), that is RTMA_GUAM_2p5km_20140803_0600.grib2. It has only 17 records in it, and the netCDF-4 file is .561 smaller (440K vs 781K). This file does not use JPEG-2000 compression, and is a small grid (193 by 193) with most of its points over water, and so has more uniform data values.

There are 15 GRIB-1 files, and 36 GRIB-2 files, and the number of records in each file varies widely. If we use the number of records to find the weighted average, we get these results:

Total over all files:

   Weighted average ratio = 2.18     Total # grib records = 400,403

Total over GRIB-1 files:

  Weighted average ratio = 1.32    Total # grib records = 24,933

Total over GRIB-2 files:

  Weighted average ratio = 2.24    Total # grib records = 375,470

So, using out-of-the box netCDF4/HDF5 deflate compression, one could expect to get netCDF-4 files that are on average 1.32 larger than GRIB-1 and 2.24 larger than GRIB-2, on these kinds of precision-limited data.

Next time: results broken out by the number of bits stored for the variable.

Posted by $entry.creator.screenName