[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CUV-251255]: Nccopy extremly slow / hangs



Hi Mark,

> After a holiday and a break from this work, I was finally able to have a look 
> at it again. Unfortunately, the fix doesn't seem to work for me :-( It is 
> still the same problem as previously - the copy starts out fine, but gets 
> progressively slower and slower, and ultimately "hangs". Here is the command 
> that I am using:
> 
> ./nccopy -u -k3 -d1 -m 2G -h 18G -e 10001 -c time/1698,longitude/6,latitude/7 
> combined_snapshots.nc temporal_read_optimised.nc
> 
> I am wondering whether I am setting the -h and -e options correctly? How 
> should these be set? I'm not sure I understand the difference between them.

It looks to me like you are setting -h correctly.  Since you are using smaller 
sizes for the longitude and latitude dimensions than I
used in my tests (6 and 7 versus 24 and 25), you will have about 14.3 times as 
many chunks as I used (I was aiming at each chunk
being about 4 MB), so you could set the number of elements in the chunk cache 
higher (61446 instead of 4301).  I had used 10001
elements in the chunk cache to be generously larger than 4301, but I think it's 
not too critical as long as the number of elements
in the chunk cache is larger than the number of cache elements you need.  Since 
you are compressing the data and reordering it in
a way that requires *all* the chunks in memory at once, you need to use at 
least "-e 61446", and to be generous should probably
use something like "-e 61500".  The HDF5 documentation recommends that the 
number of elements in the chunk cache should be 
prime, but I don't see the necessity for that and haven't noticed any 
difference whether it's prime or composite.  With the current
setting of "-e 1001", chunks that are only partly written will have to be 
ejected from the cache to make room for new chunks, and
this will lead to lots of unnecessary recompressing of chunks that are ejected 
before writing them to disk, as well as uncompressing
partially written chunks when reading them into the chunk cache.

You also need to make sure that your computer has enough memory to hold the 
chunk cache in memory.  You've specified a 2GB 
input buffer and 18GB of chunk cache memory, so you should have at least 20GB 
of memory for nccopy to run,  keeping
the data in the chunk cache uncompressed while reordering it.  You might get by 
with a smaller input buffer, say 11MB (one time of
1617*1596*4 bytes) and a somewhat smaller chunk cache, "-h 17.53G", if you're 
close to the maximum.

> The combined_snapshots.nc file is 630MB - a dump of the header is given below:

My tests have been with simulated data of the same size as you're using, but my 
simulated data may compress better than yours.
If you could possibly make your actual combined_snapshots.nc file available 
somewhere for me to test nccopy on the actual data,
I could make sure I can reproduce something like the 15 minute times I'm seeing 
for the copy and rechunking.  It may be your
use of 1698x7x6 chunks requires more time than the larger 1698x25x24 chunks I 
was writing, so I could try that as well.

> Any ideas?

I really can't explain what looks like the O(n**2) behavior you seem to be 
seeing in writing the output, unless it's something in
the HDF5 layer involving a performance bug in the B-trees that index the 
chunks.  You can't really judge the progress in writing
the output file by the size of the output, as none of the chunks are complete 
until the end of the copy.  So the output file should
stay fairly small until all of the chunks are flushed to disk (while being 
compressed) at the end of the rechunking.

Also the -h and -e options to nccopy have only been minimally tested, and there 
could still be bugs ...

--Russ

> [mpayne@oleander compiler]$ ncdump combined_snapshots.nc -h -c
> netcdf combined_snapshots {
> dimensions:
> latitude = 1617 ;
> longitude = 1596 ;
> time = UNLIMITED ; // (1698 currently)
> variables:
> float chl_oc5(time, latitude, longitude) ;
> chl_oc5:_FillValue = 0.f ;
> chl_oc5:long_name = "Chlorophyll-a concentration in sea water using the OC5 
> algorithm" ;
> chl_oc5:standard_name = "mass_concentration_of_chlorophyll_a_in_sea_water" ;
> chl_oc5:grid_mapping = "mercator" ;
> chl_oc5:units = "milligram m-3" ;
> chl_oc5:missing_value = 0.f ;
> chl_oc5:units_nonstandard = "mg m^-3" ;
> float latitude(latitude) ;
> latitude:_FillValue = -999.f ;
> latitude:standard_name = "latitude" ;
> latitude:long_name = "latitude" ;
> latitude:valid_min = -90. ;
> latitude:units = "degrees_north" ;
> latitude:valid_max = 90. ;
> latitude:axis = "Y" ;
> float longitude(longitude) ;
> longitude:_FillValue = -999.f ;
> longitude:standard_name = "longitude" ;
> longitude:long_name = "longitude" ;
> longitude:valid_min = -180. ;
> longitude:units = "degrees_east" ;
> longitude:valid_max = 180. ;
> longitude:axis = "X" ;
> int mercator ;
> mercator:false_easting = 0L ;
> mercator:standard_parallel = 0L ;
> mercator:grid_mapping_name = "mercator" ;
> mercator:false_northing = 0L ;
> mercator:longitude_of_projection_origin = 0L ;
> double time(time) ;
> time:_FillValue = -1. ;
> time:time_origin = "1970-01-01 00:00:00" ;
> time:valid_min = 0. ;
> time:long_name = "time" ;
> time:standard_name = "time" ;
> time:units = "seconds since 1970-01-01 00:00:00" ;
> time:calendar = "gregorian" ;
> time:axis = "T" ;
> 
> // global attributes:
> :site_name = "UK Shelf Seas" ;
> :citation = "If you use this data towards any publication, please acknowledge 
> this using: \'The authors thank the NERC Earth Observation Data Acquisition 
> and Analysis Service (NEODAAS) for supplying data for this study\' and then 
> email NEODAAS (address@hidden) with the details. The service relies on 
> users\' publications as one measure of success." ;
> :creation_date = "Thu Jun 02 10:51:37 2011" ;
> :easternmost_longitude = 13. ;
> :creator_url = "http://rsg.pml.ac.uk"; ;
> :references = "See NEODAAS webpages at http://www.neodaas.ac.uk/ or RSG pages 
> at http://rsg.pml.ac.uk/"; ;
> :Metadata_Conventions = "Unidata Dataset Discovery v1.0" ;
> :keywords = "satellite,observation,ocean" ;
> :summary = "This data is Level-3 satellite observation data (Level 3 meaning 
> raw observations processedto geophysical quantities, and placed onto a 
> regular grid)." ;
> :id = 
> "M2010001.1235.uk.postproc_products.MYO.01jan101235.v1.20111530951.data.nc" ;
> :naming_authority = "uk.ac.pml" ;
> :geospatial_lat_max = 62.999108 ;
> :title = "Level-3 satellite data from Moderate Resolution Imaging 
> Spectroradiometer sensor" ;
> :source = "Moderate Resolution Imaging Spectroradiometer" ;
> :northernmost_latitude = 62.999108 ;
> :creator_name = "Plymouth Marine Laboratory Remote Sensing Group" ;
> :processing_level = "Level-3 (NASA EOS Conventions)" ;
> :creator_email = "address@hidden" ;
> :netcdf_library_version = "4.0.1 of Sep  3 2010 11:27:29 $" ;
> :date_issued = "Thu Jun 02 10:51:37 2011" ;
> :geospatial_lat_min = 47. ;
> :date_created = "Thu Jun 02 10:51:37 2011" ;
> :institution = "Plymouth Marine Laboratory Remote Sensing Group" ;
> :geospatial_lon_max = 13. ;
> :geospatial_lon_min = -15. ;
> :contact1 = "email: address@hidden" ;
> :license = "If you use this data towards any publication, please acknowledge 
> this using: \'The authors thank the NERC Earth Observation Data Acquisition 
> and Analysis Service (NEODAAS) for supplying data for this study\' and then 
> email NEODAAS (address@hidden) with the details. The service relies on 
> users\' publications as one measure of success." ;
> :Conventions = "CF-1.4" ;
> :project = "NEODAAS (NERC Earth Observation Data Acquisition and Analysis 
> Service)" ;
> :cdm_data_type = "Grid" ;
> :RSG_sensor = "MODIS" ;
> :westernmost_longitude = -15. ;
> :RSG_areacode = "uk" ;
> :southernmost_latitude = 47. ;
> :netcdf_file_type = "NETCDF4_CLASSIC" ;
> :history = "Created during RSG Standard Mapping (Mapper) [SGE Job Number: 
> 2577153]" ;
> :NCO = "4.0.7" ;
> }
> [mpayne@oleander compiler]$
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: CUV-251255
Department: Support netCDF
Priority: Normal
Status: Closed