[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19990701: The CONDUIT data feed



Celia,
When the data queue is created, the system allocates "full"
pages of memory. Thus the queue size created will be slightly larger
than what is specified in ldmadmin because loading full pages
of memory is more efficient for the operating system.

If the NMC2 data feed is the only thing you are receiving,
you should be able to handle the data feed with a 800MB
queue. 

If the queue is growing larger than 800MB, then either pqexpire has 
died, or you have changed the invocation of pqexpire so that it is
running less often or keeping data longer than 1 hour.

My settings on flip which is serving the NMC2 feed:
in ldmadmin: $pq_size = 800000000;

On disk:
[694]chiz on flip --> ls -l ldm.pq
-rw-r--r--    1 ldm      ustaff    828162048 Jul  1 17:00 ldm.pq

The high water mark in the queue since June 14, 640Mb (from kill -USR1):
Jul 01 23:14:02 5Q:flip pqexpire[1145357]: > Queue usage (bytes):641124344

In ldmd.conf, pqexpire is being launched with:
exec    "pqexpire -i 1200"

Flip is running on a SGI Octane with irix 6.5.4m:
[702]chiz on flip --> uname -aR
IRIX64 flip 6.5 6.5.4m 04151556 IP30


Steve Chiswell
Unidata User SUpport

>From: address@hidden (Celia Chen)
>Organization: .
>Keywords: 199907012259.QAA28988

>Steve,
>
>I restarted LDM this morning and remake the queue 
>which is set to 1.1GB and the ldm.pq was 1,125,810,176.
>
>I just reset the pq_size to 1.2GB and the ldm.pq is
>1,228,161,024 now. 
>
>NMC2 is the only data iita2 requests. Why does it need
>such a large pq_size and why is iita2's ldm.pq always 
>larger than the set pq_size?
>
>Thanks.
>
>Celia
>
>P.S. Yes, I do see "RECLASS" in the ldmd.log files from the last
>few days.
>
>> 
>> 
>> Celia,
>> 
>> The "not enough space" message seems to indicate that the LDM
>> tried to increase the queue size, and failed. In general, you
>> should start the queue as large as you expect to need it.
>> On IRIX, the LDM will try to increase the queue size if needed-
>> but you can run into conflicts with pqexpire. This can corrupt the queue
>> and kill the LDM.
>> 
>> If pqexpire dies, then the queue would not be scoured and would
>> keep growing out of control.  I run my IRIX64 machine
>> queue size with ldmadmin specifying 800mb. My statistics show
>> that the high water mark of this queue is currently about 625Mb.
>> The queue has been serving the NMC2 feed for many months without 
>> rebuilding.
>> 
>> If you have shorter than normal MRF files, then check the LDM
>> logs for RECLASS messages. This would be a sign of latencies
>> greater than 1 hour and losing data. The other common occurence
>> is for the Cray at NCEP to crach and files are late or scrubbed.
>> 
>> Steve Chiswell
>> Unidta User Support.
>> 
>> 
>> 
>> 
>> >From: address@hidden (Celia Chen)
>> >Organization: .
>> >Keywords: 199907011817.MAA21465
>> 
>> >I started receving the MRF grid #3 data on iita2.rap.ucar.edu
>> >on 6/24/99 and feeding WITI on 6/25/99.  It looks like 
>> >the data was coming in normally for a few days.  We have just
>> >noticed that some data files camee in on 6/28 and 6/29 are much
>> >smaller than normal size. Then iita2 stopped saving data on the
>> >disk during 6/29 while WITI was able to continue archiving data
>> >until today. (See below)
>> >
>> >------------------
>> >/iita/data/ldm/mrf
>> >
>> >-rw-rw-r--    1 ldm      ldm      23534364 Jun 27 02:33 99062700132_PGrbF.m
> rf
>> >-rw-rw-r--    1 ldm      ldm      23374262 Jun 27 02:36 99062700144_PGrbF.m
> rf
>> >-rw-rw-r--    1 ldm      ldm      23473400 Jun 27 02:39 99062700156_PGrbF.m
> rf
>> >-rw-rw-r--    1 ldm      ldm      23492248 Jun 27 02:42 99062700168_PGrbF.m
> rf
>> >-rw-rw-r--    1 ldm      ldm      23232058 Jun 27 01:38 9906270024_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23332132 Jun 27 01:43 9906270036_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23315028 Jun 27 01:47 9906270048_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23323330 Jun 27 01:57 9906270060_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23264214 Jun 27 02:02 9906270072_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23431280 Jun 27 02:16 9906270084_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23394558 Jun 27 02:22 9906270096_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      19546092 Jun 28 01:50 9906280000_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9342436 Jun 28 02:02 99062800108_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9352474 Jun 28 02:07 99062800120_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23330756 Jun 28 01:54 9906280012_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9368596 Jun 28 02:10 99062800132_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9283910 Jun 28 02:13 99062800144_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9398850 Jun 28 02:14 99062800156_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      11024120 Jun 28 02:58 99062800168_PGrbF.m
> rf
>> >-rw-rw-r--    1 ldm      ldm      23236234 Jun 28 01:58 9906280024_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      21756684 Jun 28 02:02 9906280036_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      23275368 Jun 28 02:07 9906280048_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9348664 Jun 28 01:40 9906280060_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      9340868 Jun 28 01:45 9906280072_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      9365856 Jun 28 01:56 9906280084_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      9324932 Jun 28 01:58 9906280096_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      8163116 Jun 29 01:26 9906290000_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      9327876 Jun 29 01:29 9906290012_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      15553538 Jun 29 01:53 9906290024_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      17078040 Jun 29 01:55 9906290036_PGrbF.mr
> f
>> >-rw-rw-r--    1 ldm      ldm      9364992 Jun 29 01:42 9906290048_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      9364588 Jun 29 01:53 9906290060_PGrbF.mrf
>> >-rw-rw-r--    1 ldm      ldm      5715248 Jun 29 01:55 9906290072_PGrbF.mrf
>> >------------------
>> >
>> >There is this "Not enough space" message on pqact.log:
>> >
>> >------------------
>> Jun 24 22:35:54 pqact[2235]: Starting Up
>> >Jun 29 07:57:18 pqact[2235]: mmap: 18040000 0 1744732160: Not enough space
>> >Jun 29 07:57:18 pqact[2235]: Remap failed. Abandon all hope.
>> >Jun 29 07:57:18 pqact[2235]: pq_sequence failed: Not enough space (errno = 
> 12)
>> >Jun 29 07:57:18 pqact[2235]: Exiting
>> >------------------
>> >
>> >It looks like there is enough disk space to store the MRF data on
>> >iita2 at this point:
>> >
>> >-----------
>> >iita2|22|% df /iita
>> >Filesystem             Type  blocks     use     avail  %use Mounted on
>> >/dev/dsk/xlv/xlviita     xfs 14163224 11989424  2173800  85  /iita
>> >
>> >-----------
>> >What could be the cause of the problems we see here? Please advise.
>> >
>> >Thanks.
>> >
>> >Celia
>> >~
>> >-
>> >
>> 
>> ****************************************************************************
>> Unidata User Support                                    UCAR Unidata Program
>> (303)497-8644                                                  P.O. Box 3000
>> address@hidden                                   Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service                        http://www.unidata.ucar.edu/     
>> ****************************************************************************
>> 
>