The netCDF Operators NCO version 4.4.0 are ready.
http://nco.sf.net (Homepage)
http://dust.ess.uci.edu/nco (Homepage "mirror")
This release focuses on stability and speed.
It also addresses previous omissions so that full path names are
accepted by all appropriate options, such as the ncwa weights and
masks (-w and -m), and ncks auxiliary coordinates (-X).
See below for bugfixes include ncra and ncrcat use-cases involving
strides with superfluous files and/or multi-record output (MRO).
Other significant new features include major improvements to
conversion of HDF4, HDF5, and netCDF4 files to netCDF3 and netCDF4
files. Most of this work is an offshoot of writing an NCO and
CFchecker-based solution to the problem of checking CF-compliance
for of datasets in HDF4, HDF5, and netCDF4 formats. The solution
is a script called ncdismember that now works well with most
NASA stewarded datasets I've thrown at it.
We postponed the name-change of ncra and ncwa to ncrs and ncws,
respectively, until the next version for stability.
Work on NCO 4.4.1 is underway, focused on stability and speed.
There will be more netCDF4 mop-up (-X and --cnk) and, possibly,
improved HDF4 support, and cache manipulation for chunking.
Enjoy,
Charlie
"New stuff" in 4.4.0 summary (full details always in ChangeLog):
NEW FEATURES:
A. ncrename allows full group pathnames for new_name arguments.
Previously, ncrename allowed full group pathnames only for old_name
and the syntax was
ncrename -v /path/to/old_name,new_name in.nc out.nc
and new_name was presumed to be on the same path as old_name.
The new feature means this now works:
ncrename -v /path/to/old_name,/path/to/new_name in.nc out.nc
This embodies no new functionality because the paths must be
identical! In other words, /path/to must lead to the same
location. If it did not, you would be both _moving_ and _renaming_
the variable, not just renaming it.
_moving_ groups and variables is more arduous then renaming.
For more on renaming and moving, see
http://nco.sf.net/nco.html#rename
http://nco.sf.net/nco.html#move
A. Speed-up due to removing new static arrays from codebase.
Though not immediately visible to the user, memory access patterns
play a large role in determining NCO speed. Many static arrays were
introduced in developing NCO group-enabled features over the past
year. Most of these have been converted to dynamic arrays now.
This significantly accelerates NCO speed for many use cases,
including many netCDF3 cases (because NCO uses one code base for
all filetypes). If you've noticed and lamented NCO becoming more
sluggish over the past year, you may be pleasantly surprised
by NCO's new zippiness.
B. ncrename behavior. The underlying netCDF4/HDF5 library on which NCO
depends has important features and bugfixes in netCDF-4.3.1-rc5,
now available. Users who build NCO on that version or later gain
access to group renaming, and to fixes in renaming coordinates.
These features and fixes are described here:
http://nco.sf.net/nco.html#ncrename_crd
http://nco.sf.net/nco.html#ncrename
C. ncks now accepts the "all" argument to the --fix_rec_dmn option.
ncks --fix_rec_dmn=all
converts all output record dimensions to fixed dimensions.
Previously, --fix_rec_dmn only accepted the name of the single
record dimension to be fixed.
Now it is simple to fix all record dimensions simultaneously.
This is useful (and nearly mandatory) when flattening netCDF4
files that have multiple record dimensions per group into netCDF3
files (which may have at most one record dimension).
ncks --fix_rec_dmn=all in.nc out.nc
ncks -G : -3 --fix_rec_dmn=all in.nc out.nc
http://nco.sf.net/nco.html#fix_rec_dmn
http://nco.sf.net/nco.html#autocnv
D. HDF4 behavior: Thanks to recent improvements in netCDF,
NCO more gracefully handles HDF4 files. When compiled with netCDF
version 4.3.1-rc7 (20131222) or later, NCO no longer needs the
--hdf4 switch. NCO uses netCDF to determine automatically whether
the underlying file is HDF4, then takes appropriate precautions to
avoid calls not supported by the netCDF4 subset of HDF4.
ncks fl.hdf
ncks fl.hdf fl.nc
http://nco.sf.net/nco.html#hdf4
E. NCO autoconverts HDF4 and HDF5 atomic-types (e.g., NC_UBYTE,
NC_STRING) to netCDF3 atomic types (e.g., NC_SHORT, NC_CHAR) when
necessary, i.e., when the output file is netCDF3.
ncks -3 fl.hdf fl.nc
http://nco.sf.net/nco.html#autocnv
F. ncdismember flattens all groups in a file, not only leaf groups.
Previously ncdismember disaggregated only leaf groups.
Hierarchical files may contain data and/or metadata at all levels.
The new behavior disaggregates all groups with data/metadata.
ncdismember is especially useful for checking CF-compliance using
the separately installed 'cfchecker' utility.
Usage:
ncdismember ~/nco/data/dsm.nc ${DATA}/nco/tmp cf 1.5
ncdismember automatically appends the CF Conventions attribute to
all disaggregated files that do not already contain it.
This considerably reduces CF Warning and Error counts.
http://nco.sf.net/nco.html#ncdismember
G. ncdismember just plain works in most real world cases.
Taken together, NCO's new features (autoconversion to netCDF3
atomic types, fixing multiple record dimensions, autosensing
HDF4 input) and bugfixes (allowing whitespace in group and
filenames, scoping rules for CF conventions) make ncdismember
more reliable and friendly for both dismembering files and for
CF-compliance checks. Now most HDF4 and HDF5 datasets can be
checked for CF-compliance with a one-line command.
Example compliance checks of common NASA datasets are at
http://dust.ess.uci.edu/diwg/*.txt
http://nco.sf.net/nco.html#ncdismember
http://nco.sf.net/nco.html#autocnv
H. ncks now prints hidden (aka special) attributes when given the
--hdn or --hidden option. This is equivalent to ncdump -s.
Hidden attributes include: _Format, _DeflateLevel, _Shuffle,
_Storage, _ChunkSizes, _Endianness, _Fletcher32, and _NOFILL.
Previously ncks ignored all these attributes in CDL/XML modes.
Now it prints these attributes as appropriate.
http://nco.sf.net/nco.html#hdn
I. ncwa weight and mask (-w and -m) arguments may now be full path
names to variables nested within a group hierarchy.
ncwa -a lev -w /g8/lev_wgt in.nc out.nc
ncwa -a lev -w /g8/lev_msk in.nc out.nc
http://nco.sf.net/nco.html#ncwa
J. The --cnk_byt option was introduced to allow users to manually
specify the total desired chunksize (in Bytes). In the absence
of this parameter, NCO sets the chunksize to the filesystem
blocksize of the output file (if obtainable via stat()), or else
to 4096 B, the Linux default blocksize.
ncks -4 --cnk_byt=8192 in.nc out.nc
Note that --cnk_dmn arguments are still in elements, not bytes.
Should we use bytes instead of elements for all chunk arguments?
Send us your preference to help the decisions for 4.4.1.
http://nco.sf.net/nco.html#cnk
K. New Chunking policies (xst) and maps (xst, lfp)
These stand for "Existing" and "LeFter Product", respectively.
The new options allow NCO to retain existing chunking sizes, and/or
to use lfp map (suggested by Chris Barker) in many situations.
ncks -4 --cnk_plc=xst --cnk_map=lfp in.nc out.nc
http://nco.sf.net/nco.html#cnk
BUG FIXES:
A. De-compressing netCDF4 files/variables by specifying deflation
level=0 works. This fixes a bug where previously NCO could set the
deflation level of any variable to any level except zero.
B. Dimensions in hyperslabbing arguments are once again checked for
validity prior to processing and invalid (i.e., non-existent)
dimensions once-again cause operators to abort.
C. Fix one-line diagnostic bug that caused many OpenMP-enabled
operators to die when dbg_lvl > 2.
D. Fix ncra/ncrcat bug where extra record used when superfluous input
files provided and stride places first index of superfluous files
beyond user-specified last index. An "important corner-case".
Problem reported by John.
E. Fix ncra/ncrcat bug where no more files were read after all desired
records of the first record dimension were obtained (i.e., in cases
where multiple record dimensions exist in multiple files).
F. Versions 4.3.6--4.3.9 of ncra could treat missing values
incorrectly during double-precision arithmetic. A symptom was that
missing values could be replaced by strange numbers like, well,
infinity or zero. This mainly affects ncra in MRO (multi-record
ouput) mode, and the symptoms should be noticeable.
The workaround is to run the affected versions of ncra using the
--flt switch, so that single-precision floating point numbers are
not promoted. The solution is to upgrade to NCO 4.4.0.
Problem reported by Andrew Friedman.
http://nco.sf.net#bug_ncra_mss_val
G. Versions through 4.3.9 would not always copy/print groups that
contain _only_ metadata (i.e., contain no variables). Fixed.
H. Sometimes the "coordinates" and "bounds" CF attributes caused
incorrect matches to out-of-scope variables in hierarchical files.
Fixed.
I. NCO correctly handles output filenames that contain whitespace.
Previously, NCO would complain when moving the temporary to the
final output file (the workaround was to use --no_tmp_fl).
J. ncks XML/NcML no longer creates a _FillValue attribute for unsigned
types. It did so in NCO 4.3.7--4.3.9 because Unidata toolsUI does
so, but apparently this is a bug not a feature so NCO no longer
emulates it. Likewise, ncks emits a _ChunkSizes attributes when
appropriate, not (like toolsUI) a _ChunkSize attribute.
K. Chunking options were not working as intended for some time. Fixed.
KNOWN ISSUES NOT YET FIXED:
This section of ANNOUNCE reports and reminds users of the
existence and severity of known, not yet fixed, problems.
These problems occur with NCO 4.4.0 built/tested with netCDF
4.3.1-rc7 snapshot 20131222 on top of HDF5 hdf5-1.8.9 with these
methods:
cd ~/nco;./configure --enable-netcdf4 # Configure mechanism -or-
cd ~/nco/bld;make dir;make allinone # Old Makefile mechanism
A. NOT YET FIXED (would require DAP protocol change?)
Unable to retrieve contents of variables including period '.' in name
Periods are legal characters in netCDF variable names.
Metadata are returned successfully, data are not.
DAP non-transparency: Works locally, fails through DAP server.
Demonstration:
ncks -O -C -D 3 -v var_nm.dot -p
http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc # Fails to
find variable
20130724: Verified problem still exists.
Stopped testing because inclusion of var_nm.dot broke all test scripts.
NB: Hard to fix since DAP interprets '.' as structure delimiter in
HTTP query string.
Bug report filed: https://www.unidata.ucar.edu/jira/browse/NCF-47
B. NOT YET FIXED (would require DAP protocol change)
Correctly read scalar characters over DAP.
DAP non-transparency: Works locally, fails through DAP server.
Problem, IMHO, is with DAP definition/protocol
Demonstration:
ncks -O -D 1 -H -C -m --md5_dgs -v md5_a -p
http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc
20120801: Verified problem still exists
Bug report not filed
Cause: DAP translates scalar characters into 64-element (this
dimension is user-configurable, but still...), NUL-terminated
strings so MD5 agreement fails
C. NOT YET FIXED (NCO problem)
Correctly read arrays of NC_STRING with embedded delimiters in
ncatted arguments
Demonstration:
ncatted -D 5 -O -a
new_string_att,att_var,c,sng,"list","of","str,ings" ~/nco/data/in_4.nc
~/foo.nc
ncks -m -C -v att_var ~/foo.nc
20130724: Verified problem still exists
TODO nco1102
Cause: NCO parsing of ncatted arguments is not sophisticated
enough to handle arrays of NC_STRINGS with embedded delimiters.
D. NOT YET FIXED (netCDF library problem)
Probe hidden attributes (chunking, compression) of HDF4 files
Demonstration:
ncdump -h -s ~/nco/data/hdf.hdf # (dies)
ncks -m ~/nco/data/hdf.hdf # (works by avoiding fatal calls)
20131230: Verified problem still exists
Cause: some libnetCDF library functions fail on HDF4 file inquiries.
Bug report filed: netCDF #HZY-708311 ncdump/netCDF4 segfaults probing
HDF4 file
Tracking tickets NCF-272, NCF-273
E. [FIXED in netCDF 4.3.1-rc5 ... please upgrade]
netCDF4 library fails when renaming dimension and variable using
that dimension, in either order. Works fine with netCDF3.
Also library causes var rename to imply dimension rename, and visa versa.
Hence coordinate renaming does not work with netCDF4 files.
Problem with netCDF4 library implementation.
Demonstration:
ncks -O -4 -v lat_T42 ~/nco/data/in.nc ~/foo.nc
ncrename -O -D 2 -d lat_T42,lat -v lat_T42,lat ~/foo.nc ~/foo2.nc #
Breaks with "NetCDF: HDF error"
ncks -m ~/foo.nc
20130724: FIXED in netCDF 4.3.1-rc5 in 201212. Will be in netCDF 4.3.1.
Bug report filed: netCDF #YQN-334036: problem renaming dimension and
coordinate in netCDF4 file
Workaround: Use ncrename twice; first rename the variable, then
rename the dimension.
More Info: http://nco.sf.net/nco.html#ncrename_crd
"Sticky" reminders:
A. Pre-built, up-to-date Debian Sid & Ubuntu packages:
http://nco.sf.net#debian
B. Pre-built Fedora and CentOS RPMs:
http://nco.sf.net#rpm
C. Pre-built Windows (native) and Cygwin binaries:
http://nco.sf.net#windows
D. Pre-built AIX binaries:
http://nco.sf.net#aix
E. Did you try SWAMP (Script Workflow Analysis for MultiProcessing)?
SWAMP efficiently schedules/executes NCO scripts on remote servers:
http://swamp.googlecode.com
SWAMP can work command-line operator analysis scripts besides NCO.
If you must transfer lots of data from a server to your client
before you analyze it, then SWAMP will likely speed things up.
F. NCO support for netCDF4 features is tracked at
http://nco.sf.net/nco.html#nco4
NCO supports netCDF4 atomic data types, compression, chunking, and
groups.
G. Reminder that NCO works on most HDF4 and HDF5 datasets, e.g.,
NASA AURA HIRDLS HDF-EOS5
NASA ICESat GLAS HDF5
NASA MERRA HDF4
NASA MODIS HDF4
NASA SBUV HDF5...
--
Charlie Zender, Earth System Sci. & Computer Sci.
University of California, Irvine 949-891-2429 )'(