All - seeing the issue with history files we thought we should alert you
to a known "bug" with the history command and history file in RHEL that
caused major slow down and eventual lock up of our system. It is a known
problem in RHEL 2.1, 3, 4, 5 and 6. We only ran into it on RHEL 6.2 but
the remedy was to add "set history=0" in our .cshrc
We also added an alias h50 "set history=50" so that when you open an
terminal window and type h50 it will start remembering your commands.
Klunky yes, but at least our workstations don't lock up. I had to log on
to the REHL customer portal and search for "42739" to get this printout -
which I cut/paste into this email. I hope it translates well in your
email viewer.
See below - although not sure it will address your original issue. Pete
-------------------------------------------
Currently Being Moderated
A system slows down due to a .history file
Article ID: 42739 - Created on: Oct 5, 2010 10:54 PM - Last Modified: Jan
23, 2011 9:09 PM
Issue
The data in the file ".history" becomes malformed and its file size gets
larger and larger.
Each command history should be recoded in the file with a timestamp line
a command entry line which should end with EOL(End Of Line).
Example of normal .history file:
$ cat -n .history
1 #+1289787344
2 set
A timestamp and a run command entry are not recorded by turns. Not a run
command entry but a timestamp is recorded unexpectedly (see the line 4).
Additionally, some entries are merged unexpectedly (see the line 6 and 7
below).
Example of malformed .history file :
$ cat -n .history
1 #+1289787344
2 test
3 #+1289787367
4 1289787366
5 #+1289787367
6 128978#+12897#+1289#+12897test
7 #+1289#+12897testls##+1289787401l12#+1289787402
8 st
A system slows down because csh uses a lot of memory to read such a large
.history file.
Environment
Red Hat Enterprise Linux 2.1, 3, 4, 5, and 6
Resolution
This issue will be addressed in "Bug 648592 - .history file gets corrupted
if several scripts run at once"
There is a temporary workaround by disabling the history of csh
Set the one of the following lines in ~/.cshrc or on the command line:
unset savehist
OR
set savehist=
Root Cause
The main issue which should be fixed is that tcsh does not handle
~/.history file exclusively. The "merge" option causes possibility of a
little more unexpected behaviour with warning mentioned in man page
regarding the "-S" built-in command:
...
history [-hTr] [n]
history -S|-L|-M [filename] (+)
history -c (+)
...
With -S, the second form saves the history list to file-
name. If the first word of the savehist shell variable is
set to a number, at most that many lines are saved. If the
second word of savehist is set to ?merge?, the history list
is merged with the existing history file instead of replac-
ing it (if there is one) and sorted by time stamp. (+)
____Merging is intended for an environment like the X
Window
System with several shells in simultaneous use. Currently
it succeeds only when the shells quit nicely one after
another.____
Additionally, both a value greater than 0 and "merge" option are set to
savehist on csh and this is the default setting in Red Hat Enterprise
Linux 5.4 or later:
$ set | grep savehist
savehist (1024 merge)
From: Greg Stossmeister <gstoss@xxxxxxxx>
To: gembud@xxxxxxxxxxxxxxxx
Date: 04/30/2012 03:48 PM
Subject: Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
Sent by: gembud-bounces@xxxxxxxxxxxxxxxx
Daryl,
I noticed .history was bizarre last week - it had a ton of stuff in it
that didn't look like the normal history stuff I was used to seeing. We
deleted it this morning and I set my .cshrc to only save the last 40
commands.
Greg
On Apr 30, 2012, at 1:40 PM, daryl herzmann wrote:
>
> Offline... How large is your ~/.history file?
>
> daryl
>
> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
>
>> Daryl,
>> No each process creates a temporary working subdirectory to run in
based on the process id.
>>
>> Greg
>>
>> On Apr 30, 2012, at 1:23 PM, daryl herzmann wrote:
>>
>>> Greg,
>>>
>>> Are all these processes running out of the same CWD (directory)? Try
creating temp directories for each process and run the code from those
directories.
>>>
>>> daryl
>>>
>>> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
>>>
>>>> Daryl,
>>>> I have several shell scripts that I'm running out of cron every 5
minutes. Each shell script runs 10 gpmap_gf processes in sequence. I've
tried running 1 - 6 scripts at a time. This typically works fine during
the day with one of the these scripts completing in about 2 minutes. As
evening comes on they take longer and longer to run and it seems like that
take more and more memory. From the "top" command the scripts often use
500-800 MB of memory but in the evening this seems to mushroom to > 3GB
per script. The load on the machine at night from these scripts alone
jumps to >30 and by morning the machine usually dies with out of memory
errors even though I'm automatically killing the scripts when they run
longer than 2 minutes.
>>>>
>>>> Looking at /var/log/debug.log I'm seeing segfault errors:
>>>>
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2164]: segfault at 0 ip
000000392692ff7f sp 00007fff8d981128 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic abrt[2179]: saved core dump of pid 2164
(/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf) to
/var/spool/abrt/ccpp-201
>>>> 2-04-26-17:07:25-2164.new/coredump (827392 bytes)
>>>> Apr 26 17:07:25 sferic abrtd: Directory
'ccpp-2012-04-26-17:07:25-2164' creation detected
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2239]: segfault at 0 ip
000000392692ff7f sp 00007ffffd173658 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2242]: segfault at 0 ip
000000392692ff7f sp 00007fff8f4df6f8 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2247]: segfault at 0 ip
000000392692ff7f sp 00007fff73574d18 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2261]: segfault at 0 ip
000000392692ff7f sp 00007fff8bda1358 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2245]: segfault at 0 ip
000000392692ff7f sp 00007fff71495a28 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: Pid 2245(gpmap_gf) over
core_pipe_limit
>>>> Apr 26 17:07:25 sferic kernel: Skipping core dump
>>>> Apr 26 17:07:25 sferic abrt[2260]: not dumping repeating crash in
'/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf'
>>>> Apr 26 17:07:25 sferic abrt[2279]: not dumping repeating crash in
'/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf'
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2289]: segfault at 0 ip
000000392692ff7f sp 00007fffca7118a8 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2286]: segfault at 0 ip
000000392692ff7f sp 00007fffef00ac98 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2303]: segfault at 0 ip
000000392692ff7f sp 00007fff92019618 error 4 in
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: Pid 2303(gpmap_gf) over
core_pipe_limit
>>>>
>>>> Greg
>>>>
>>>> On Apr 30, 2012, at 12:10 PM, daryl herzmann wrote:
>>>>
>>>>> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
>>>>>
>>>>>> Does anyone generate a lot of individual NEXRAD level III products
with gpmap_gf? I'm trying to generate real-time plots of NOQ Reflectivity
and NOU Velocity from 30 radars in the midwest and its crashing my server
after a few hours, even when I only run 3 plots at a time. I'm running
GEMPAK6.4.0 on a RHEL 6 machine with 64 GB of memory. I'm wondering what
I'm doing wrong and if someone has a better way of doing this.
>>>>>
>>>>> crashing your server, how? Exhausting memory? kernic panic? Are
the processes not going away once running them? How are you running them?
>>>>>
>>>>> daryl
>>>>>
>>>>> --
>>>>> /**
>>>>> * Daryl Herzmann
>>>>> * Assistant Scientist -- Iowa Environmental Mesonet
>>>>> * http://mesonet.agron.iastate.edu
>>>>> */
>>>>
_______________________________________________
gembud mailing list
gembud@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.