On Sun, Nov 13, 2016 at 5:24 AM, Julian Kunkel <juliankunkel@xxxxxxxxxxxxxx>
wrote:
> Dear Liam,
> to investigate the problem and identify the cause, I propose the following:
> - on your old system, copy the file to /dev/shm, then time the reading
> of the file (as you did).
> - on your new system do the same.
>
> If there is an issue inside the host or the library, then the time
> should differ.
> I expect this won't be the case but the problem is likely inside the
> Lustre client.
>
Julian,
Thank you for the suggestion to use /dev/shm. Eliminating the different
lustre client versions between the two systems is a good idea, and that
seems to be what happened.
On the new cluster, in /dev/shm, the timing came out pretty much the same
as in the shared lustre filesystem.
n0:shm$ time ncks test.nc out.nc
real 0m29.509s
user 0m28.605s
sys 0m0.466s
On the Cray, in /dev/shm, the timing also came out pretty much the same as
in the shared lustre filesystem.
fish1:shm$ time ncks test.nc out.nc
real 0m4.023s
user 0m3.152s
sys 0m0.832s
So, based on these results in /dev/shm, I infer from your comments that
there is a difference in the host configuration, NCO, NetCDF, or HDF5
between these two systems that is causing the performance difference. I
think I may try installing the same, older versions of the packages on the
new cluster as we have on the Cray and see if that changes anything. At
least it will be a better apples to apples comparison.
--
Regards,
-liam
-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes loforbes@xxxxxxxxxx ph: 907-450-8618 fax: 907-450-8601
UAF Research Computing Systems Senior HPC Engineer LPIC1, CISSP