You guys are replying faster than I can keep up! (Which is awful nice of you!)
I was able to change the chunk size and get a file size that makes much more
sense. With a chunk size of 1024, I get a file of 166kBytes.
What are the units of chunk size by the way?
-Val
> On Apr 5, 2016, at 3:53 PM, Chris Barker <chris.barker@xxxxxxxx> wrote:
>
> oh, and I've enclosed my code -- your didn't actually run -- missing imports?
>
>
>
>
> On Tue, Apr 5, 2016 at 12:52 PM, Chris Barker <chris.barker@xxxxxxxx
> <mailto:chris.barker@xxxxxxxx>> wrote:
>
>
> On Tue, Apr 5, 2016 at 12:13 PM, Ted Mansell <ted.mansell@xxxxxxxx
> <mailto:ted.mansell@xxxxxxxx>> wrote:
> You might check the ChunkSizes attribute with 'ncdump -hs'. The newer netcdf
> sets larger default chunks than it used to. I had this issue with 1-d
> variables that used an unlimited dimension. Even if the dimension only had a
> small number, the default chunk made it much bigger.
>
> I had the same issue -- 1-d variable had a chunksize of 1, which was really,
> really bad!
>
> But that doesn't seem to be the issue here -- I ran the same code, and get
> the same results, and here is the dump:
>
> netcdf text3 {
> types:
> ubyte(*) variable_data_t ;
> dimensions:
> timestamp_dim = UNLIMITED ; // (1 currently)
> data_dim = UNLIMITED ; // (1 currently)
> item_len = 100 ;
> variables:
> double timestamp(timestamp_dim) ;
> timestamp:_Storage = "chunked" ;
> timestamp:_ChunkSizes = 524288 ;
> variable_data_t data(data_dim) ;
> data:_Storage = "chunked" ;
> data:_ChunkSizes = 4194304 ;
> data:_NoFill = "true" ;
>
> // global attributes:
> :_Format = "netCDF-4" ;
> }
>
> if I read that right, nice big chunks.
>
> note that if I do'nt use a VLType variable, I still get a 4MB file -- though
> that could be the netcdf4 overhead:
>
> netcdf text3 {
> types:
> ubyte(*) variable_data_t ;
> dimensions:
> timestamp_dim = UNLIMITED ; // (1 currently)
> data_dim = UNLIMITED ; // (1 currently)
> item_len = 100 ;
> variables:
> double timestamp(timestamp_dim) ;
> timestamp:_Storage = "chunked" ;
> timestamp:_ChunkSizes = 524288 ;
> ubyte data(data_dim, item_len) ;
> data:_Storage = "chunked" ;
> data:_ChunkSizes = 1, 100 ;
>
> // global attributes:
> :_Format = "netCDF-4" ;
> }
>
> something is up with the VLen.....
>
> -CHB
>
>
>
>
> (Assuming the variable is not compressed.)
>
> -- Ted
>
> __________________________________________________________
> | Edward Mansell <ted.mansell@xxxxxxxx <mailto:ted.mansell@xxxxxxxx>>
> | National Severe Storms Laboratory
> |--------------------------------------------------------------
> | "The contents of this message are mine personally and
> | do not reflect any position of the U.S. Government or NOAA."
> |--------------------------------------------------------------
>
> On Apr 5, 2016, at 1:44 PM, Val Schmidt <vschmidt@xxxxxxxxxxxx
> <mailto:vschmidt@xxxxxxxxxxxx>> wrote:
>
> > Hello netcdf folks,
> >
> > I’m testing some python code for writing sets of timestamps and variable
> > length binary blobs to a netcdf file and the resulting file size is
> > perplexing to me.
> >
> > The following segment of python code creates a file with just two
> > variables, “timestamp” and “data”, populates the first entry of the
> > timestamp variable with a float and the corresponding first entry of the
> > data variable with an array of 100 unsigned 8-bit integers. The total
> > amount of data is 108 bytes.
> >
> > But the resulting file is over 73 MB in size. Does anyone know why this
> > might be so large and what I might be doing to cause it?
> >
> > Thanks,
> >
> > Val
> >
> >
> > from netCDF4 import Dataset
> > import numpy
> >
> > f = Dataset('scratch/text3.nc <http://text3.nc/>','w')
> >
> > dim = f.createDimension('timestamp_dim',None)
> > data_dim = f.createDimension('data_dim',None)
> >
> > data_t = f.createVLType('u1','variable_data_t’)
> >
> > timestamp = f.createVariable('timestamp','d','timestamp_dim')
> > data = f.createVariable('data',data_t,'data_dim’)
> >
> > timestamp[0] = time.time()
> > data[0] = uint8( numpy.ones(1,100))
> >
> > f.close()
> >
> > ------------------------------------------------------
> > Val Schmidt
> > CCOM/JHC
> > University of New Hampshire
> > Chase Ocean Engineering Lab
> > 24 Colovos Road
> > Durham, NH 03824
> > e: vschmidt [AT] ccom.unh.edu <http://ccom.unh.edu/>
> > m: 614.286.3726 <tel:614.286.3726>
> >
> >
> > _______________________________________________
> > netcdfgroup mailing list
> > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
> > For list information or to unsubscribe, visit:
> > http://www.unidata.ucar.edu/mailing_lists/
> > <http://www.unidata.ucar.edu/mailing_lists/>
>
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
> <http://www.unidata.ucar.edu/mailing_lists/>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959> voice
> 7600 Sand Point Way NE (206) 526-6329 <tel:%28206%29%20526-6329> fax
> Seattle, WA 98115 (206) 526-6317 <tel:%28206%29%20526-6317> main
> reception
>
> Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx><huge_nc_file.py>
------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726