Re: [netcdfgroup] compression without effect

Frank,

Compression (and optionally chunking) parameters have to be set after a
variable is created but *before* any data is written to the variable.

I'm guessing you might be setting the compression parameters after
writing data to the variable.  If that's what's happening, no
compression occurs, but an exception should have been raised to alert
you to the error.  It looks like a bug in the C++ library that no
exception was raised in that case.

If you just want to experiment with various compression levels,
shuffling, and chunk shapes, the nccopy utility with -d, -s, and -c
options may be useful:

  http://www.unidata.ucar.edu/netcdf/docs/nccopy-man-1.html

The nccopy utility applies the same compression parameters to every
variable in the file.  To vary compression parameters on different
variables requires programming ...

--Russ

> --===============0387134831==
> Content-Type: multipart/alternative; boundary=001a11c2fda81bcfcb04edd370e5
> 
> --001a11c2fda81bcfcb04edd370e5
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Dear Ted,
> 
> maybe I was shortening a bit too much:
> 
> netcdf test1_ {
> dimensions:
>         pixel_x_0 = 11256 ;
>         pixel_y_0 = 256 ;
>         complex_0 = 2 ;
> variables:
>         double data_complex_set_0(pixel_x_0, pixel_y_0, complex_0) ;
>                 data_complex_set_0:title = "New Field" ;
>                 data_complex_set_0:_Storage = "chunked" ;
>                 data_complex_set_0:_ChunkSizes = 3752, 86, 1 ;
>                 data_complex_set_0:_DeflateLevel = 9 ;
> 
> ...
> 
> Further I tried different deflate levels. - all the same result.
> Shuffle also makes no difference.
> 
> Best,
> Frank
> 
> 
> 
> 
> 
> 2013/12/18 Ted Mansell <ted.mansell@xxxxxxxx>
> 
> > What are the actual dimensions of the variable? The rightmost
> > (fastest-varying) chunk dimension is 1, which might be a problem. If that
> > is an unlimited dimension, then reordering the array would be a good idea,
> > if possible. You might try turning the Shuffle filter on, but I don't know
> > how much effect that would have. Are you setting the chunk dimensions
> > yourself or relying on the default chunking scheme?
> >
> > On a side note, my experience is that deflatelevel=2 gives a good
> > compromise between speed and compression. Higher values tend to yield only
> > modestly better compression for the increased computation. Your mileage may
> > vary!
> >
> > Cheers,
> >
> > -- Ted
> >
> > On Dec 18, 2013, at 11:37 AM, Frank Fuchs wrote:
> >
> > > I tried ncdump -s getting the follwing infos:
> > >
> > > variables:
> > >   ...
> > >   data_complex_set_0:_Storage = "chunked" ;
> > >   data_complex_set_0:_ChunkSizes = 3752, 86, 1 ;
> > >   data_complex_set_0:_DeflateLevel = 9 ;
> > >
> > > // global attributes:
> > >    ....
> > >    :_Format = "netCDF-4" ;
> > >
> > > Are those chunksizes meaningful?
> > >
> > > On a different thought. Does netcdf use zlib directly or via the HDF
> > library?
> > > Something could go wrong there as well, no?
> > >
> > > Thank you! Best,
> > > Frank
> > >
> > >
> > >
> > >
> > > 2013/12/18 Russ Rew <russ@xxxxxxxxxxxxxxxx>
> > > Hi Frank,
> > >
> > > > Now I wanted to test compression using the cxx4 interface, enabling it
> > by
> > > > ncvar_data.setCompression(true,true,1) for the heaviest of my
> > variables.
> > > >
> > > > However, even for a file filled with constants the files remain as big
> > as before.
> > > > Further tests using nccopy -d9 old.nca new.nca did not result in a
> > modification of the file size.
> > >
> > > If you use an unlimited dimension, that may prevent compression,
> > > because it means that each variable is divided into chunks for
> > > compression, with one record per chunk.  There is significant HDF5
> > > space overhead for storing lots of tiny chunks, even if they can be
> > > compressed.
> > >
> > > Two solutions include:
> > >
> > >     1.  If you don't need the unlimited dimension any more, perhaps
> > >         because no more data will be appended to the files, then convert
> > >         the unlimited dimension into a fixed-size dimension, resulting in
> > >         all the values of each variable being stored contiguously, which
> > >         should be more compressible.
> > >
> > >     2.  If you still need the unlimited dimension, then rechunk the data
> > >         before compressing it, so the compression can work on larger
> > >         chunks.
> > >
> > > The nccopy utility can be used for both of these approaches.
> > >
> > > For approach 1:
> > >
> > >     $ nccopy -u orig.nc orig-u.nc        # makes unlimited dimension
> > fixed size
> > >     $ nccopy -d9 orig-u.nc orig-u-d9.nc  # compresses result
> > >
> > > For approach 2, assuming you have a record dimension "t" with each chunk
> > > a slice of only one t value:
> > >
> > >     $ nccopy -c t/10 orig.nc orig-c.nc   # chunks t dimension using 10
> > instead of 1
> > >     $ nccopy -d9 orig-c.nc orig-c-d9.nc # compresses result
> > >
> > > --Russ
> > >
> > >
> > > > --===============1981692180==
> > > > Content-Type: multipart/alternative;
> > boundary=047d7bdc99b29242bc04edc0db6b
> > > >
> > > > --047d7bdc99b29242bc04edc0db6b
> > > > Content-Type: text/plain; charset=ISO-8859-1
> > > >
> > > > Hi,
> > > >
> > > > I managed to compile netcdf-4.3.0 using mingw-w64 gcc 4.8.1.
> > > > All I had to disabale was DAP (I have no use for anyway).
> > > >
> > > > I tested that I can read and write netcdf files using the newly build
> > .dll
> > > > Now I wanted to test compression using the cxx4 interface, enabling it
> > by
> > > > ncvar_data.setCompression(true,true,1) for the heaviest of my
> > variables.
> > > >
> > > > However, even for a file filled with constants the files remain as big
> > as
> > > > before.
> > > > Further tests using nccopy -d9 old.nca new.nca did not result in a
> > > > modification of the file size.
> > > >
> > > > Any advise?
> > > >
> > > > Best,
> > > > Frank
> > > >
> > > > --047d7bdc99b29242bc04edc0db6b
> > > > Content-Type: text/html; charset=ISO-8859-1
> > > > Content-Transfer-Encoding: quoted-printable
> > > >
> > > > <div dir=3D"ltr">Hi,<div><br></div><div>I managed to compile
> > netcdf-4.3.0 u=
> > > > sing mingw-w64 gcc 4.8.1.</div><div>All I had to disabale was DAP (I
> > have n=
> > > > o use for anyway).</div><div><br></div><div>I tested that I can read
> > and wr=
> > > > ite netcdf files using the newly build .dll</div>
> > > > <div>Now I wanted to test compression using the cxx4 interface,
> > enabling it=
> > > >  by=A0</div><div>ncvar_data.setCompression(true,true,1) for the
> > heaviest of=
> > > >  my variables.=A0<br></div><div><br></div><div>However, even for a
> > file fil=
> > > > led with constants the files remain as big as before.=A0</div>
> > > > <div>Further tests using nccopy -d9 old.nca new.nca did not result in
> > a mod=
> > > > ification of the file size.</div><div><br></div><div>Any
> > advise?</div><div>=
> > > >
> > <br></div><div>Best,</div><div>Frank</div><div><br></div><div><br></div>
> > > > </div>
> > > >
> > > > --047d7bdc99b29242bc04edc0db6b--
> > > >
> > > >
> > > > --===============1981692180==
> > > > Content-Type: text/plain; charset="us-ascii"
> > > > MIME-Version: 1.0
> > > > Content-Transfer-Encoding: 7bit
> > > > Content-Disposition: inline
> > > >
> > > > _______________________________________________
> > > > netcdfgroup mailing list
> > > > netcdfgroup@xxxxxxxxxxxxxxxx
> > > > For list information or to unsubscribe,  visit:
> > http://www.unidata.ucar.edu/m
> > > > ailing_lists/
> > > > --===============1981692180==--
> > >
> >
> >
> 
> --001a11c2fda81bcfcb04edd370e5
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: quoted-printable
> 
> <div dir=3D"ltr">Dear Ted,=A0<div><br></div><div>maybe I was shortening a b=
> it too much:</div><div><br></div><div><div>netcdf test1_ {</div><div>dimens=
> ions:</div><div>=A0 =A0 =A0 =A0 pixel_x_0 =3D 11256 ;</div><div>=A0 =A0 =A0=
>  =A0 pixel_y_0 =3D 256 ;</div>
> <div>=A0 =A0 =A0 =A0 complex_0 =3D 2 ;</div><div>variables:</div><div>=A0 =
> =A0 =A0 =A0 double data_complex_set_0(pixel_x_0, pixel_y_0, complex_0) ;</d=
> iv><div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 data_complex_set_0:title =3D &quot;=
> New Field&quot; ;</div><div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 data_complex_se=
> t_0:_Storage =3D &quot;chunked&quot; ;</div>
> <div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 data_complex_set_0:_ChunkSizes =3D 375=
> 2, 86, 1 ;</div><div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 data_complex_set_0:_De=
> flateLevel =3D 9 ;</div></div><div><br></div><div>...</div><div><br></div><=
> div>Further I tried different deflate levels. - all the same result.</div>
> <div>Shuffle also makes no difference.</div><div><br></div><div>Best,</div>=
> <div>Frank</div><div><br></div><div><br></div><div><br></div></div><div cla=
> ss=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/12/18 Ted Mansel=
> l <span dir=3D"ltr">&lt;<a href=3D"mailto:ted.mansell@xxxxxxxx"; target=3D"_=
> blank">ted.mansell@xxxxxxxx</a>&gt;</span><br>
> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
> x #ccc solid;padding-left:1ex">What are the actual dimensions of the variab=
> le? The rightmost (fastest-varying) chunk dimension is 1, which might be a =
> problem. If that is an unlimited dimension, then reordering the array would=
>  be a good idea, if possible. You might try turning the Shuffle filter on, =
> but I don&#39;t know how much effect that would have. Are you setting the c=
> hunk dimensions yourself or relying on the default chunking scheme?<br>
> 
> <br>
> On a side note, my experience is that deflatelevel=3D2 gives a good comprom=
> ise between speed and compression. Higher values tend to yield only modestl=
> y better compression for the increased computation. Your mileage may vary!<=
> br>
> 
> <br>
> Cheers,<br>
> <br>
> -- Ted<br>
> <div class=3D"HOEnZb"><div class=3D"h5"><br>
> On Dec 18, 2013, at 11:37 AM, Frank Fuchs wrote:<br>
> <br>
> &gt; I tried ncdump -s getting the follwing infos:<br>
> &gt;<br>
> &gt; variables:<br>
> &gt; =A0 ...<br>
> &gt; =A0 data_complex_set_0:_Storage =3D &quot;chunked&quot; ;<br>
> &gt; =A0 data_complex_set_0:_ChunkSizes =3D 3752, 86, 1 ;<br>
> &gt; =A0 data_complex_set_0:_DeflateLevel =3D 9 ;<br>
> &gt;<br>
> &gt; // global attributes:<br>
> &gt; =A0 =A0....<br>
> &gt; =A0 =A0:_Format =3D &quot;netCDF-4&quot; ;<br>
> &gt;<br>
> &gt; Are those chunksizes meaningful?<br>
> &gt;<br>
> &gt; On a different thought. Does netcdf use zlib directly or via the HDF l=
> ibrary?<br>
> &gt; Something could go wrong there as well, no?<br>
> &gt;<br>
> &gt; Thank you! Best,<br>
> &gt; Frank<br>
> &gt;<br>
> &gt;<br>
> &gt;<br>
> &gt;<br>
> &gt; 2013/12/18 Russ Rew &lt;<a href=3D"mailto:russ@xxxxxxxxxxxxxxxx";>russ@=
> unidata.ucar.edu</a>&gt;<br>
> &gt; Hi Frank,<br>
> &gt;<br>
> &gt; &gt; Now I wanted to test compression using the cxx4 interface, enabli=
> ng it by<br>
> &gt; &gt; ncvar_data.setCompression(true,true,1) for the heaviest of my var=
> iables.<br>
> &gt; &gt;<br>
> &gt; &gt; However, even for a file filled with constants the files remain a=
> s big as before.<br>
> &gt; &gt; Further tests using nccopy -d9 old.nca new.nca did not result in =
> a modification of the file size.<br>
> &gt;<br>
> &gt; If you use an unlimited dimension, that may prevent compression,<br>
> &gt; because it means that each variable is divided into chunks for<br>
> &gt; compression, with one record per chunk. =A0There is significant HDF5<b=
> r>
> &gt; space overhead for storing lots of tiny chunks, even if they can be<br=
> >
> &gt; compressed.<br>
> &gt;<br>
> &gt; Two solutions include:<br>
> &gt;<br>
> &gt; =A0 =A0 1. =A0If you don&#39;t need the unlimited dimension any more, =
> perhaps<br>
> &gt; =A0 =A0 =A0 =A0 because no more data will be appended to the files, th=
> en convert<br>
> &gt; =A0 =A0 =A0 =A0 the unlimited dimension into a fixed-size dimension, r=
> esulting in<br>
> &gt; =A0 =A0 =A0 =A0 all the values of each variable being stored contiguou=
> sly, which<br>
> &gt; =A0 =A0 =A0 =A0 should be more compressible.<br>
> &gt;<br>
> &gt; =A0 =A0 2. =A0If you still need the unlimited dimension, then rechunk =
> the data<br>
> &gt; =A0 =A0 =A0 =A0 before compressing it, so the compression can work on =
> larger<br>
> &gt; =A0 =A0 =A0 =A0 chunks.<br>
> &gt;<br>
> &gt; The nccopy utility can be used for both of these approaches.<br>
> &gt;<br>
> &gt; For approach 1:<br>
> &gt;<br>
> &gt; =A0 =A0 $ nccopy -u <a href=3D"http://orig.nc"; target=3D"_blank">orig.=
> nc</a> <a href=3D"http://orig-u.nc"; target=3D"_blank">orig-u.nc</a> =A0 =A0=
>  =A0 =A0# makes unlimited dimension fixed size<br>
> &gt; =A0 =A0 $ nccopy -d9 <a href=3D"http://orig-u.nc"; target=3D"_blank">or=
> ig-u.nc</a> <a href=3D"http://orig-u-d9.nc"; target=3D"_blank">orig-u-d9.nc<=
> /a> =A0# compresses result<br>
> &gt;<br>
> &gt; For approach 2, assuming you have a record dimension &quot;t&quot; wit=
> h each chunk<br>
> &gt; a slice of only one t value:<br>
> &gt;<br>
> &gt; =A0 =A0 $ nccopy -c t/10 <a href=3D"http://orig.nc"; target=3D"_blank">=
> orig.nc</a> <a href=3D"http://orig-c.nc"; target=3D"_blank">orig-c.nc</a> =
> =A0 # chunks t dimension using 10 instead of 1<br>
> &gt; =A0 =A0 $ nccopy -d9 <a href=3D"http://orig-c.nc"; target=3D"_blank">or=
> ig-c.nc</a> <a href=3D"http://orig-c-d9.nc"; target=3D"_blank">orig-c-d9.nc<=
> /a> # compresses result<br>
> &gt;<br>
> &gt; --Russ<br>
> &gt;<br>
> &gt;<br>
> &gt; &gt; --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1981692180=3D=3D<b=
> r>
> &gt; &gt; Content-Type: multipart/alternative; boundary=3D047d7bdc99b29242b=
> c04edc0db6b<br>
> &gt; &gt;<br>
> &gt; &gt; --047d7bdc99b29242bc04edc0db6b<br>
> &gt; &gt; Content-Type: text/plain; charset=3DISO-8859-1<br>
> &gt; &gt;<br>
> &gt; &gt; Hi,<br>
> &gt; &gt;<br>
> &gt; &gt; I managed to compile netcdf-4.3.0 using mingw-w64 gcc 4.8.1.<br>
> &gt; &gt; All I had to disabale was DAP (I have no use for anyway).<br>
> &gt; &gt;<br>
> &gt; &gt; I tested that I can read and write netcdf files using the newly b=
> uild .dll<br>
> &gt; &gt; Now I wanted to test compression using the cxx4 interface, enabli=
> ng it by<br>
> &gt; &gt; ncvar_data.setCompression(true,true,1) for the heaviest of my var=
> iables.<br>
> &gt; &gt;<br>
> &gt; &gt; However, even for a file filled with constants the files remain a=
> s big as<br>
> &gt; &gt; before.<br>
> &gt; &gt; Further tests using nccopy -d9 old.nca new.nca did not result in =
> a<br>
> &gt; &gt; modification of the file size.<br>
> &gt; &gt;<br>
> &gt; &gt; Any advise?<br>
> &gt; &gt;<br>
> &gt; &gt; Best,<br>
> &gt; &gt; Frank<br>
> &gt; &gt;<br>
> &gt; &gt; --047d7bdc99b29242bc04edc0db6b<br>
> &gt; &gt; Content-Type: text/html; charset=3DISO-8859-1<br>
> &gt; &gt; Content-Transfer-Encoding: quoted-printable<br>
> &gt; &gt;<br>
> &gt; &gt; &lt;div dir=3D3D&quot;ltr&quot;&gt;Hi,&lt;div&gt;&lt;br&gt;&lt;/d=
> iv&gt;&lt;div&gt;I managed to compile netcdf-4.3.0 u=3D<br>
> &gt; &gt; sing mingw-w64 gcc 4.8.1.&lt;/div&gt;&lt;div&gt;All I had to disa=
> bale was DAP (I have n=3D<br>
> &gt; &gt; o use for anyway).&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&l=
> t;div&gt;I tested that I can read and wr=3D<br>
> &gt; &gt; ite netcdf files using the newly build .dll&lt;/div&gt;<br>
> &gt; &gt; &lt;div&gt;Now I wanted to test compression using the cxx4 interf=
> ace, enabling it=3D<br>
> &gt; &gt; =A0by=3DA0&lt;/div&gt;&lt;div&gt;ncvar_data.setCompression(true,t=
> rue,1) for the heaviest of=3D<br>
> &gt; &gt; =A0my variables.=3DA0&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&=
> lt;/div&gt;&lt;div&gt;However, even for a file fil=3D<br>
> &gt; &gt; led with constants the files remain as big as before.=3DA0&lt;/di=
> v&gt;<br>
> &gt; &gt; &lt;div&gt;Further tests using nccopy -d9 old.nca new.nca did not=
>  result in a mod=3D<br>
> &gt; &gt; ification of the file size.&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/=
> div&gt;&lt;div&gt;Any advise?&lt;/div&gt;&lt;div&gt;=3D<br>
> &gt; &gt; &lt;br&gt;&lt;/div&gt;&lt;div&gt;Best,&lt;/div&gt;&lt;div&gt;Fran=
> k&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div=
> &gt;<br>
> &gt; &gt; &lt;/div&gt;<br>
> &gt; &gt;<br>
> &gt; &gt; --047d7bdc99b29242bc04edc0db6b--<br>
> &gt; &gt;<br>
> &gt; &gt;<br>
> &gt; &gt; --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1981692180=3D=3D<b=
> r>
> &gt; &gt; Content-Type: text/plain; charset=3D&quot;us-ascii&quot;<br>
> &gt; &gt; MIME-Version: 1.0<br>
> &gt; &gt; Content-Transfer-Encoding: 7bit<br>
> &gt; &gt; Content-Disposition: inline<br>
> &gt; &gt;<br>
> &gt; &gt; _______________________________________________<br>
> &gt; &gt; netcdfgroup mailing list<br>
> &gt; &gt; <a href=3D"mailto:netcdfgroup@xxxxxxxxxxxxxxxx";>netcdfgroup@unida=
> ta.ucar.edu</a><br>
> &gt; &gt; For list information or to unsubscribe, =A0visit: <a href=3D"http=
> ://www.unidata.ucar.edu/m" target=3D"_blank">http://www.unidata.ucar.edu/m<=
> /a><br>
> &gt; &gt; ailing_lists/<br>
> &gt; &gt; --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1981692180=3D=3D--=
> <br>
> &gt;<br>
> <br>
> </div></div></blockquote></div><br></div>
> 
> --001a11c2fda81bcfcb04edd370e5--
> 
> 
> --===============0387134831==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: http://www.unidata.ucar.edu/m
> ailing_lists/ 
> --===============0387134831==--



  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: