Hi Everyone,
The easiest way to get a lot of information about an LDM is to:
<as 'ldm'>
ldmadmin config
The next thing is to see about disk space:
df -h
The next thing is to see about RAM:
cat /proc/meminfo
I think it would be most useful to see the output from all of these
commands on Jack's system.
Cheers,
Tom
On 4/24/20 3:45 PM, Gerry Creager - NOAA Affiliate via ldm-users wrote:
I'm also interested in the size of the product queue (look in
~ldm/etc/registry.xml for the queue size) vs the amount of ram
available. It sounds like you could be hammering system memory.
gerry
On Fri, Apr 24, 2020 at 8:44 PM Mike Zuranski <zuranski.wx@xxxxxxxxx
<mailto:zuranski.wx@xxxxxxxxx>> wrote:
Hi Jack,
First thing I want to point out is (barring any symlink or similar
shenanigans) your product queue is not under /home/ldm/var/data/.
As shown by LDM's error message, the product queue is the
/home/ldm/var/queues/ldm.pq file. That single file will house the
entire queue, so you wouldn't see excessive files from that.
That being said, the times I've had issues like yours with not being
able to log in or issue commands, it was usually because of either a
full root partition ("/"), full /tmp partition (unlikely that's
relevant here, but just FYI), full memory, or full inodes on a
partition. I see Tom already asked about "df -h" output, and you
already checked inodes and that appears fine. But those have been
some of my experiences as well.
So what IS in /home/ldm/var/data ? My guess is that's where LDM is
saving data to, and that configuration would be found in your pqact
file(s). One thing you could try is running the following command
to see what LDM will attempt to save in that directory (assuming
your pqact file(s) are named "pqact..." and in that dir, otherwise
adjust accordingly): "grep var/data ~/etc/pqact* | grep -i file"
(without quotes)
Side-note to the above: By default, relative paths with the FILE
action will start in the "/home/ldm" directory. This is set in
~/etc/registry.xml under /pqact/datadir-path, and you can check it
with "regutil /pqact/datadir-path" (without quotes). If that points
straight to your /home/ldm/var/data/ dir then THAT becomes the
default starting point for relative paths (and it might make the
above grep command come back empty).
If there are actions to save data there they should (hopefully but
not guaranteed to) be listed by that grep command, and that could
point you where to look next. If it comes back empty then maybe
something's getting PIPEd to a script which is in turn saving data
there, but that might be harder to track down. Either way, it's
hard to know without looking in that directory or your pqact(s) what
might be happening, but hopefully this will yield a clue or two.
It's possible you're getting more than you think you're asking for,
and it's leading to that directory filling up... and if that's on
the root partition it could explain the log in / lock up issues.
You also mentioned ldmadmin scour doesn't seem to be doing much.
Check ~/etc/scour.conf to see where it's doing actual scouring.
Maybe it's not looking in that data directory, or maybe it is
letting files stay too long.
I'd also be curious about the size of your product queue vs. the
size of the partition it's on. If it's able to get made and LDM
starts at all it's probably fine, but it is worth paying attention
to. The size of the queue gets defined in ~/etc/registry.xml, then
just compare "ls -lh /home/ldm/var/queues/ldm.pq" and "df -h" to see
how the partition is filling up the disk. I try to ensure the
partition it's on stays at 75% or less, though I don't think that's
a true hard/fast rule, just guidance.
Some reference pages that may be useful to you if you haven't seen
these already:
https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/ldmd.conf.html
https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/pqact.conf.html
https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/scour.conf.html
https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/LDM-registry.html
Per your last email:
> just to confirm... find and rm on the data dir won't mess up /
confuse the ldm queue stuff?
It shouldn't. Again, from what I've seen in your original email
that's not where the queue is. And even if it were, scour shouldn't
touch it as long as it keeps updating (though rm -rf would). I'd
double-check ~/etc/registry.xml to verify the queue is housed
elsewhere, but it sounds like you should be fine on this.
Hope some of this helps you out,
-Mike
======================
Mike Zuranski
Meteorology Support Analyst
College of DuPage - Nexlab
Weather.cod.edu <http://weather.cod.edu/>
======================
On Fri, Apr 24, 2020 at 1:32 PM Jack Snodgrass <jack@xxxxxxxxxxxxxx
<mailto:jack@xxxxxxxxxxxxxx>> wrote:
having issues with our server ( centos7 ) that runs ldm...
locking up. It has happened 2 times in the last 3 weeks or so.
The server is pingable... so it's not totally dead.. but you
can't get a local or remote console to start. can't figure out
if it is out of memory or file handles or what.... it's like a
ghost of itself.
After rebooting... the /home/ldm/var/data/ has around 350,000
files in it. I am not sure if that is 'ok' or a bit extra.
We are running a
ldmadmin scour
command... via cron but I don't know what that is doing exactly
or it it's doing much.
when I try and restart ldm it says:
Checking the product-queue...
The writer-counter of the product-queue isn't zero. Either a process
has the product-queue open for writing or the queue might be
corrupt.
Terminate the process and recheck or use
pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F -q
/home/ldm/var/queues/ldm.pq
to validate the queue and set the writer-counter to zero.
LDM not started
In the past.... during testing and what not.. I've been able to
run:
pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F
-q/home/ldm/var/queues/ldm.pq
and ldm would start after that. This time.. with the 350K files
or so.. that pqcat stuff fails.
I am deleting older ( than a day ) files from the
/home/ldm/var/data/ direcory... going to see if
pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F
-q/home/ldm/var/queues/ldm.pq
will work or if I have to rm -rf /home/ldm/var/data/ and start a
new q.
If ldmadmin scour does not let us remove enough files from
/home/ldm/var/data/ can I use find and rm to remove files or do
they have to be removed using ldm to keep and queses or indexes
in sync?
- jack
--
*jack* - Southlake Texas - http://mylinuxguy.net
<http://mylinuxguy.net/> - *817-601-7338*
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx <mailto:ldm-users@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
https://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx <mailto:ldm-users@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
https://www.unidata.ucar.edu/mailing_lists/
--
Gerry Creager
NSSL/CIMMS
405.325.6371
++++++++++++++++++++++
/The way to get started is to quit talking and begin doing./
/ Walt Disney/
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
https://www.unidata.ucar.edu/mailing_lists/
--
+----------------------------------------------------------------------+
* Tom Yoksas UCAR Unidata Program *
* (303) 497-8642 (last resort) P.O. Box 3000 *
* yoksas@xxxxxxxx Boulder, CO 80307 *
* Unidata WWW Service http://www.unidata.ucar.edu/ *
+----------------------------------------------------------------------+