[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetCDF Java Read API



Hi Greg:

1) what version of netcdf-java are you using?

2) how much memory do you give the JVM (-Xmx option). the default can be as low 
as 32 Megs.

3) are you reading the entire data into memory?

Greg Rappa wrote:
> This morning, Ai-Hoa sent a query message to you all but, since then,
> we've discovered more and interesting behavior of the Java readers
> that I'd like to share.  A question regarding Java file reading and
> memory usage has come up.  I've tried to present the situation as
> clearly as possible here.
> 
> 
> The NetCDF files we write contain one variable, named 'VIL',
> of four dimensions: 24 times corrsponding to that many forecasts
> of the VIL product, 1 altitude layer and 2-D grids sized to the
> extent of the CONUS at 1 km resolution (5120 rows by 3520 columns).
> 
> The CDL appears as follows (edited for size):
> 
>   netcdf edu.mit.ll.wx.ciws.VILForecast.20080912T182500Z {
>     dimensions:
>       time = 24 ;
>       z0 = 1 ;
>       y0 = 3520 ;
>       x0 = 5120 ;
>     variables:
>       double time(time) ;
>               time:standard_name = "time" ;
>               time:long_name = "Product validity time" ;
>               time:units = "seconds since 1970-01-01T00:00:00Z" ;
>               time:calendar = "gregorian" ;
>               time:string = "2008-09-12T18:30:00Z/2008-09-12T20:25:00Z" ;
>       double z0(z0) ;
>               z0:standard_name = "altitude" ;
>               z0:long_name = "Product altitude" ;
>               z0:units = "meters" ;
>               z0:axis = "Z" ;
>               z0:positive = "up" ;
>       double y0(y0) ;
>               y0:standard_name = "projection_y_coordinate" ;
>               y0:long_name = "Distance from projection reference point
> latitude" ;
>               y0:units = "meters" ;
>       double x0(x0) ;
>               x0:standard_name = "projection_x_coordinate" ;
>               x0:long_name = "Distance from projection reference point
> longitude" ;
>               x0:units = "meters" ;
>       short VIL(time, z0, y0, x0) ;
>               VIL:standard_name = "atmosphere_cloud_liquid_water_content" ;
>               VIL:long_name = "Vertically integrated liquid water (VIL)" ;
>               VIL:class_name = "FCST" ;
>               VIL:product_name = "FCST" ;
>               VIL:units = "kg m-2" ;
>               VIL:grid_mapping = "grid_mapping0" ;
>               VIL:scale_factor = 0.00244148075807978 ;
>               VIL:add_offset = 0. ;
>               VIL:_FillValue = -1s ;
>               VIL:valid_range = 0s, 32767s ;
> 
> The variable is written to disk using the NetCDF4 C++ library,
> with compression enabled (Level 6), by a single call to NcVar::put(),
> as depicted in the following abbreviated code snippet:
> 
>   unsigned int tBins = tDim->size();  // 24 forecasts
>   unsigned int zBins = zDim->size();  // 1  altitude layer
>   unsigned int yBins = yDim->size();  // 5120 columns
>   unsigned int xBins = xDim->size();  // 3520 rows
>   unsigned int allBins = tBins*zBins*yBins*xBins;  // 18,022,400 bins
> 
>   short* shortBuffer = new short[ allBins ];
> 
>   NcVar* ncVar = ncFile->add_var( varName.c_str(), ncByte,
>                                   tDim, zDim, yDim, xDim );
> 
>   ncVar->put( shortBuffer, tBins, zBins, yBins, xBins );
> 
> 
> The chunking size for this file is set equal to the X/Y grid size:
> 5120 * 3520 = 18,022,400.
> 
> Files written this way can be read by the NetCDF C++ library.
> However, a number of users from different agencies have been
> reporting that their Java VMs run out of memory while reading
> the file.
> 
> Ai-Hoa, Bill and I have demonstrated that the file can be read,
> but only on a 64-bit Linux platform using a Java VM configured
> with a 5 GB maximum memory.  Running with a 4 GB max memory
> limit results in the Java VM crash.  The question we have is:
> 
>     Given that the raw variable written to disk consumes about
>     45 MB, and requires a total of 433 MB when uncompressed
>     into a 4-D array of shorts ... what else is the Java NetCDF
>     and/or HDF5 layer doing to consume the remaining 4.5 GB of
>     memory in the Java VM?
> 
> I've exported a sample file to our public ftp site.  You are all
> welcome to download the file and see what you can make of the
> Java VM memory constraints.  The file is available at:
> 
> ftp://ftp.ll.mit.edu/outgoing/gregr/edu.mit.ll.wx.ciws.VILForecast.20080912T182500Z.nc
> 
> Of course, if there are any suggestions for alternate methods for
> writing the variable to disk, I'd appreciate that too.  For instance,
> should I set my chunking size to the maximum data size, that is,
> 24 * 1 * 5120 * 3520 * 2 (bytes) = 865,075,200 ?  That seemed
> a little extreme to me, so I stuck with 5120*3520.
> 
> Thanks,
> Greg.
> 
> 
> Sanh, Ai-Hoa wrote:
>>
>> Hello,
>>
>>  
>>
>> I hope you don’t mind our asking a question about the NetCDF Java Read
>> methods.
>>
>>  
>>
>> Greg has written some NetCDF files with 24 hours worth of forecasts as
>> their data. I am getting “Out of Memory” errors when I try to read
>> these files, even when I am trying to read only a small portion of the
>> data.
>>
>>  
>>
>> I tested my code with smaller files, and the reads worked fine.
>>
>>  
>>
>> So I was wondering if you have any suggestions for what may be wrong.
>> Perhaps I am not calling the methods correctly. Or there is a limit of
>> the size of files or data.
>>
>>  
>>
>> I’m more than happy to send you the code I am using. And we’ll find a
>> way to get the test files to you. Just let me know to whom I should
>> send them, so that I’m not inundating all of your mailboxes.
>>
>>  
>>
>> Thanks much.
>>
>> Ai-Hoa
>>
>