Re: [netcdfgroup] NetCDF NC_CHAR file double the size of ASCII file

On Tue, May 20, 2014 at 9:46 AM, Timothy Stitt <Timothy.Stitt.9@xxxxxx>wrote:

>  I then checked how to use the
> NETCDF-4 format instead and made the change to my write routine.


now that you've got netcdf-4, yuo can compress, which should really help.
but still...



> I¹ve now
> got my NC file in NETCDF-4 format but I¹m still seeing the 2X file storage
> increase compared to my original ASCII file. Can you see any other
> problems with my file structure based on the ncdump command below?
>
> netcdf plate {
> dimensions:
>         Record_Lines = 4 ;
>         Line_Symbols = 87 ;
>         Record_Number = UNLIMITED ; // (11474 currently)
> variables:
>         char Record(Record_Number, Record_Lines, Line_Symbols) ;
>                 Record:_Storage = "chunked" ;
>                 Record:_ChunkSizes = 1, 4, 87 ;
>

OK -- that does look like the old defaults. If I've got this right, your
chunks are 4*87=348 bytes -- that's pretty small . IN some limited
experiments, I found you want chunks of at least kb, and MB is probably
better. You might try: 1024, 4, 87 and see how it works.

-Chris






> // global attributes:
>                 :_Format = "netCDF-4" ;
> }
>
> The files sizes are as follows:
>
> 2.2M May 13 16:03 plate.10000 (original ASCII file with 4*10000 lines -
> 10000 records, 4 lines per record)
> 4.5M May 20 12:38 plate.nc
>
> Thanks in advance for your help,
>
> Tim.
> ______________________________________________
> Tim Stitt PhD
> User Support Manager (CRC)
> Research Assistant Professor (Computer Science & Engineering)
> Room 108, Center for Research Computing, University of Notre Dame, IN 46556
> Email: tstitt@xxxxxx
>
>
>
>
>
> On 5/20/14, 11:43 AM, "Rob Latham" <robl@xxxxxxxxxxx> wrote:
>
> >
> >
> >On 05/19/2014 09:52 AM, Timothy Stitt wrote:
> >> Hi all,
> >>
> >> I¹ve been trying to convert a large (40GB) ASCII text file (composed of
> >> multiple records of 4 line ASCII strings about 90 characters long) into
> >> NetCDF format. My plan was to rewrite the original serial code to use
> >> parallel NetCDF to have many MPI processes concurrently read records and
> >> process them in parallel.
> >>
> >> I was able to write some code to convert the ASCII records into
> >> [unlimited][4][90] NetCDF NC_CHAR arrays, which I was able to read
> >> concurrently via parallel NetCDF routines. My question is related to the
> >> size of the converted NetCDF file.
> >>
> >> I notice that the converted NetCDF file is always double the size of the
> >> ASCII file whereas I was hoping for it be to much reduced. I was
> >> therefore wondering if this is expected or is more due to my bad
> >> representation in NetCDF of the ASCII records? I am using
> >> nc_put_vara_text() to write my records. Maybe I need to introduce
> >> compression that I¹m not doing already?
> >
> >Are you using the classic file format or the NetCDF-4 file format?
> >
> >Can you provide an ncdump -h of the new file?
> >
> >==rob
> >
> >>
> >> Thanks in advance for any advice you can provide.
> >>
> >> Regards,
> >>
> >> Tim.
> >>
> >>
> >> _______________________________________________
> >> netcdfgroup mailing list
> >> netcdfgroup@xxxxxxxxxxxxxxxx
> >> For list information or to unsubscribe,  visit:
> >>http://www.unidata.ucar.edu/mailing_lists/
> >>
> >
> >--
> >Rob Latham
> >Mathematics and Computer Science Division
> >Argonne National Lab, IL USA
> >
> >_______________________________________________
> >netcdfgroup mailing list
> >netcdfgroup@xxxxxxxxxxxxxxxx
> >For list information or to unsubscribe,  visit:
> >http://www.unidata.ucar.edu/mailing_lists/
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: