I'm also interested in the size of the product queue (look in
~ldm/etc/registry.xml for the queue size) vs the amount of ram available.
It sounds like you could be hammering system memory.
gerry
On Fri, Apr 24, 2020 at 8:44 PM Mike Zuranski <zuranski.wx@xxxxxxxxx> wrote:
> Hi Jack,
>
> First thing I want to point out is (barring any symlink or similar
> shenanigans) your product queue is not under /home/ldm/var/data/. As
> shown by LDM's error message, the product queue is the
> /home/ldm/var/queues/ldm.pq file. That single file will house the entire
> queue, so you wouldn't see excessive files from that.
>
> That being said, the times I've had issues like yours with not being able
> to log in or issue commands, it was usually because of either a full root
> partition ("/"), full /tmp partition (unlikely that's relevant here, but
> just FYI), full memory, or full inodes on a partition. I see Tom already
> asked about "df -h" output, and you already checked inodes and that appears
> fine. But those have been some of my experiences as well.
>
> So what IS in /home/ldm/var/data ? My guess is that's where LDM is saving
> data to, and that configuration would be found in your pqact file(s). One
> thing you could try is running the following command to see what LDM will
> attempt to save in that directory (assuming your pqact file(s) are named
> "pqact..." and in that dir, otherwise adjust accordingly): "grep var/data
> ~/etc/pqact* | grep -i file" (without quotes)
>
> Side-note to the above: By default, relative paths with the FILE action
> will start in the "/home/ldm" directory. This is set in ~/etc/registry.xml
> under /pqact/datadir-path, and you can check it with "regutil
> /pqact/datadir-path" (without quotes). If that points straight to your
> /home/ldm/var/data/ dir then THAT becomes the default starting point for
> relative paths (and it might make the above grep command come back empty).
>
> If there are actions to save data there they should (hopefully but not
> guaranteed to) be listed by that grep command, and that could point you
> where to look next. If it comes back empty then maybe something's getting
> PIPEd to a script which is in turn saving data there, but that might be
> harder to track down. Either way, it's hard to know without looking in
> that directory or your pqact(s) what might be happening, but hopefully this
> will yield a clue or two. It's possible you're getting more than you think
> you're asking for, and it's leading to that directory filling up... and if
> that's on the root partition it could explain the log in / lock up issues.
>
> You also mentioned ldmadmin scour doesn't seem to be doing much. Check
> ~/etc/scour.conf to see where it's doing actual scouring. Maybe it's not
> looking in that data directory, or maybe it is letting files stay too long.
>
> I'd also be curious about the size of your product queue vs. the size of
> the partition it's on. If it's able to get made and LDM starts at all it's
> probably fine, but it is worth paying attention to. The size of the queue
> gets defined in ~/etc/registry.xml, then just compare "ls -lh
> /home/ldm/var/queues/ldm.pq" and "df -h" to see how the partition is
> filling up the disk. I try to ensure the partition it's on stays at 75% or
> less, though I don't think that's a true hard/fast rule, just guidance.
>
> Some reference pages that may be useful to you if you haven't seen these
> already:
> https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/ldmd.conf.html
>
>
> https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/pqact.conf.html
>
> https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/scour.conf.html
>
>
> https://www.unidata.ucar.edu/software/ldm/ldm-current/basics/LDM-registry.html
>
>
> Per your last email:
> > just to confirm... find and rm on the data dir won't mess up / confuse
> the ldm queue stuff?
>
> It shouldn't. Again, from what I've seen in your original email that's
> not where the queue is. And even if it were, scour shouldn't touch it as
> long as it keeps updating (though rm -rf would). I'd double-check
> ~/etc/registry.xml to verify the queue is housed elsewhere, but it sounds
> like you should be fine on this.
>
> Hope some of this helps you out,
>
> -Mike
>
> ======================
> Mike Zuranski
> Meteorology Support Analyst
> College of DuPage - Nexlab
> Weather.cod.edu <http://weather.cod.edu/>
> ======================
>
>
> On Fri, Apr 24, 2020 at 1:32 PM Jack Snodgrass <jack@xxxxxxxxxxxxxx>
> wrote:
>
>> having issues with our server ( centos7 ) that runs ldm... locking up. It
>> has happened 2 times in the last 3 weeks or so.
>> The server is pingable... so it's not totally dead.. but you can't get a
>> local or remote console to start. can't figure out if it is out of memory
>> or file handles or what.... it's like a ghost of itself.
>>
>> After rebooting... the /home/ldm/var/data/ has around 350,000 files in
>> it. I am not sure if that is 'ok' or a bit extra.
>>
>> We are running a
>>
>> ldmadmin scour
>>
>> command... via cron but I don't know what that is doing exactly or it
>> it's doing much.
>>
>> when I try and restart ldm it says:
>>
>> Checking the product-queue...
>> The writer-counter of the product-queue isn't zero. Either a process
>> has the product-queue open for writing or the queue might be corrupt.
>> Terminate the process and recheck or use
>> pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F -q
>> /home/ldm/var/queues/ldm.pq
>> to validate the queue and set the writer-counter to zero.
>> LDM not started
>>
>>
>> In the past.... during testing and what not.. I've been able to run:
>> pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F -q
>> /home/ldm/var/queues/ldm.pq
>>
>> and ldm would start after that. This time.. with the 350K files or so..
>> that pqcat stuff fails.
>>
>> I am deleting older ( than a day ) files from the /home/ldm/var/data/
>> direcory... going to see if
>>
>> pqcat -l- -s -q /home/ldm/var/queues/ldm.pq && pqcheck -F -q
>> /home/ldm/var/queues/ldm.pq
>>
>>
>> will work or if I have to rm -rf /home/ldm/var/data/ and start a new q.
>>
>>
>> If ldmadmin scour does not let us remove enough files from
>> /home/ldm/var/data/ can I use find and rm to remove files or do they have
>> to be removed using ldm to keep and queses or indexes in sync?
>>
>> - jack
>>
>> --
>> *jack* - Southlake Texas - http://mylinuxguy.net - *817-601-7338*
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> ldm-users mailing list
>> ldm-users@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> https://www.unidata.ucar.edu/mailing_lists/
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> ldm-users mailing list
> ldm-users@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> https://www.unidata.ucar.edu/mailing_lists/
>
--
Gerry Creager
NSSL/CIMMS
405.325.6371
++++++++++++++++++++++
*The way to get started is to quit talking and begin doing.*
* Walt Disney*