Hi Thomas,
You wrote:
> OK, some update on that one: I applied the workaround of compiling
> dumplib.o with -O0. This makes `make check` (OK, in my case, `gmake
> check` ... ) succeed, but the resulting ncdump is still broken.
>
> Again, two points:
> 1. I suggest adding another test case, with the cdl file I am about to
> paste.
Thanks for this new test. As it's apparently stricter than the ncdump
tests we have, we'll add it.
> 2. I again would like to know if someone reported this to Sun. This
> miscompilation is really a serious issue and should be addressed. I
> will report it myself if there is noone giving notice...
The user who reported and helped investigate this problem in early
February also committed to reporting the bug to Sun. You can read about
my unsuccessful attempts to isolate the bug to a smaller program than
ncdump or to find a workaround that would not trigger the bug here:
http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg05358.html
The details of the bug, as reported by that user, are:
here's the solution with the Sun compiler:
ncdump/dumplib.c must be compiled using -O0 explicitly, otherwise
-O2 is used by default. By hand, just remove dumplib.o, add -O0
to CFLAGS in the Makefile (second occurence), and gmake . The
depending programs are recompiled and the tests succeed.
This seems to be an optimizer bug, I've checked the code produced,
and it does not set the xmm0 register in the complicated
version and breaks calling ABI for libc, whereas your simple code
below shows that it is set as expected. The value printed is just
a random value in xmm0 used for something before. I've just halted the code
before entering snprintf and set xmm0 explicitly to the value, continued,
and, voila, the value printed is correct !
The instructions generated are TOTALLY different, a symptom I've seen
very often, just adding a line somewhere completely changes the
generated code, which makes it really hard to track down such errors.
I'll report this to Sun, maybe they've a better clue why this happens.
I've been unable to determine that it got successfully logged as a Sun
compiler bug. If we can't find it after a little more searching, we'll
report it again.
> Down to the mode of failure. I generate a test NetCDF file from this
> CDL:
>
> netcdf bubble {
> dimensions:
> element = 1000 ;
> variable = 1 ;
> base = 1 ;
> time = UNLIMITED ; // (0 currently)
> variables:
> double time(time) ;
> double coefficient(time, element, variable, base) ;
>
> // global attributes:
> :info = "Model state for the AWI DG model, ThOr breed." ;
> :par_stringsize = 30 ;
> :par_base_grades = 0, 0, 0 ;
> :par_grid_elements = 10, 10, 10 ;
> :par_hill_params = 0.01, 0.1, 0.1, 0.1 ;
> :par_linad_speed = 1., 1., 1. ;
> :par_oro_types = "null null
> null" ;
> :par_shallow_gravity = 1. ;
> :par_sys_name = "linear advection" ;
> :par_timeint_rksteps = 1 ;
> :par_timeint_step = 0.1 ;
> :par_trans_gradients = 2., 2., 2. ;
> :par_trans_types = "linear linear
> linear" ;
> :par_world_dims = 3 ;
> :par_world_lengths = 10., 10., 10. ;
> data:
> }
>
>
> shell$ ncgen -o bubble.nc bubble.cdl
>
> Now I have a look at it with ncdump compiled with CFLAGS=-m64 overall,
> but dumplib.o being built with CFLAGS='-O0 -m64' instead:
>
> shell$ ncdump bubble.nc
> netcdf bubble {
> dimensions:
> element = 1000 ;
> variable = 1 ;
> base = 1 ;
> time = UNLIMITED ; // (0 currently)
> variables:
> double time(time) ;
> double coefficient(time, element, variable, base) ;
>
> // global attributes:
> :info = "Model state for the AWI DG model, ThOr breed." ;
> :par_stringsize = 30 ;
> :par_base_grades = 0, 0, 0 ;
> :par_grid_elements = 10, 10, 10 ;
> :par_hill_params = 2.22044604925031e-16, 0.999999992549419,
> 0.999999992549419, 0.999999992549419 ;
> :par_linad_speed = 0.999999992549419, 0.999999992549419,
> 0.999999992549419 ;
> :par_oro_types = "null null
> null" ;
> :par_shallow_gravity = 0.999999992549419 ;
> :par_sys_name = "linear advection" ;
> :par_timeint_rksteps = 1 ;
> :par_timeint_step = 0.999999992549419 ;
> :par_trans_gradients = 0.999999992549419, 0.999999992549419,
> 0.999999992549419 ;
> :par_trans_types = "linear linear
> linear" ;
> :par_world_dims = 3 ;
> :par_world_lengths = 0.999999992549419, 0.999999992549419,
> 0.999999992549419 ;
> data:
> }
>
> That looks grossly wrong. Rebuilding everything inside the ncdump/
> directory with CFLAGS="-O0 -m64" results into a working ncdump binary,
> output is identical to input CDL file. This is disturbing also as it
> leads to the question if my application will be affected by the same
> bug that harrasses ncdump when building with Sun Studio. Did really
> nonone investigate the mode of breakage and why it apparently(?!) does
> not affect other parts of NetCDF?
Yes, we investigated to the point of determining that it was a compiler
bug when compiling with -m64 for 64-bit environment, and we tried
unsuccessfully to find a workaround other than using -O0 when compiling.
> So... shall one start crying at Sun to fix their compiler on
> Solaris/x86-64 with NetCDF or is there some hidden wisdom already that
> I am not aware of?
It would probably help if you could also report this bug.
--Russ