[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #URY-771320]: LDM crash on noaaport2.cod.edu



Gilbert,

That's a lot of stuff to go through.

Was the problem you encountered the deletion of the product-queue or was it 
that a noaaportIngester(1) process on Noaaport2 terminated?

> Hello Steve,
> 
> I am filing this on behalf of the College of DuPage
> weather program.
> 
> They have two NOAAport ingesters, one called
> noaaport1, and another called noaaport2. This
> is the LDM log they had up until today:
> 
> 20160616T172541.915447Z climate.cod.edu(feed)[87852] NOTE
> up6.c:448:up6_run() topo:  climate.cod.edu {{NOTHER|NGRAPH|NGRID|WMO,
> (.*)}}
> 20160616T180257.714530Z climate.cod.edu(feed)[87852] NOTE
> error.c:236:err_log() Failure; COMINGSOON: RPC: Unable to receive; errno =
> Connection reset by peer
> 20160616T180257.718349Z climate.cod.edu(feed)[88804] NOTE
> uldb.c:1533:sm_vetUpstreamLdm() Terminated obsolete upstream LDM
> (addr=10.11.0.65, pid=87852, vers=6, type=feeder, mode=alternate,
> sub=(20160616165105.
> 909806 TS_ENDT {{NOTHER|NGRAPH|NGRID|WMO, ".*"}}))
> 20160616T180257.777565Z climate.cod.edu(feed)[87852] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T180257.778198Z ldmd[1652] NOTE ldmd.c:168:reap() child 87852
> exited with status 6
> 20160616T180258.720061Z climate.cod.edu(feed)[88804] NOTE
> up6.c:445:up6_run() Starting Up(6.13.1/6): 20160616172822.714117 TS_ENDT
> {{NOTHER|NGRAPH|NGRID|WMO, ".*"}}, SIG=be6fe5d603a6ad3357c99379e61de688,
> Prima
> ry
> 20160616T180258.720095Z climate.cod.edu(feed)[88804] NOTE
> up6.c:448:up6_run() topo:  climate.cod.edu {{NOTHER|NGRAPH|NGRID|WMO,
> (.*)}}
> 20160616T185401.837778Z noaaportIngester[1655] ERROR
> productMaker.c:948:pmStart() Missing GOES fragment in sequence, last
> 1155/141123 this 1157/141123
> 20160616T185401.856478Z noaaportIngester[1657] ERROR
> productMaker.c:569:pmStart() ERROR in calculation of psh len 32802 16
> 20160616T185401.856523Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2443 26836
> 20160616T185401.856543Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2413 55144
> 20160616T185401.856570Z noaaportIngester[1657] ERROR
> readsbn.c:24:readsbn() SBN checksum invalid 2435 10272
> 20160616T185403.874899Z ldmd[1652] NOTE ldmd.c:122:reap() child 1657
> terminated by signal 11: noaaportIngester -m 224.0.1.4
> 20160616T185403.874927Z ldmd[1652] NOTE ldmd.c:148:reap() Killing
> (SIGTERM) process group
> 20160616T185403.875297Z noaaportIngester[1663] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875341Z noaaportIngester[1663] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875509Z noaaportIngester[1660] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875555Z noaaportIngester[1660] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875534Z noaaportIngester[1661] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875579Z noaaportIngester[1661] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875582Z noaaportIngester[1662] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875636Z noaaportIngester[1662] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.875641Z noaaportIngester[1658] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.875649Z noaaportIngester[1658] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876115Z noaaportIngester[1656] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876130Z noaaportIngester[1656] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876126Z noaaportIngester[1659] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876170Z noaaportIngester[1659] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876268Z noaaportIngester[1663] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.667276S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> 
> Inserted      0
> Mean Rate     0/s
> Since Start:
> Duration          P13DT16H16M1.667276S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> ----------------------------------------
> 20160616T185403.876303Z noaaportIngester[1654] ERROR
> fifo.c:340:fifo_transferFromFd() Interrupted system call
> 20160616T185403.876318Z noaaportIngester[1654] ERROR
> fifo.c:340:fifo_transferFromFd() Couldn't read up to 65507 bytes from file
> descriptor 5
> 20160616T185403.876413Z noaaportIngester[1662] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.659078S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> Since Start:
> Duration          P13DT16H16M1.659078S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> ----------------------------------------
> 20160616T185403.876544Z weather.cod.edu(feed)[82274] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.876601Z noaaportIngester[1660] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.665902S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> Since Start:
> Duration          P13DT16H16M1.665902S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> ----------------------------------------
> 20160616T185403.876688Z noaaportIngester[1658] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.663040S
> Raw Data:
> Octets        213446871416
> Mean Rate:
> Octets    180618/s
> Bits      1.44494e+06/s
> Received frames:
> Number        52992790
> Mean Rate     44.8422/s
> Missed frames:
> Number        4307
> %             0.00812686
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      22074
> Mean Rate     0.0186789/s
> Since Start:
> Duration          P13DT16H16M1.663040S
> Raw Data:
> Octets        213446871416
> Mean Rate:
> Octets    180618/s
> Bits      1.44494e+06/s
> Received frames:
> Number        52992790
> Mean Rate     44.8422/s
> Missed frames:
> Number        4307
> %             0.00812686
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      22074
> Mean Rate     0.0186789/s
> ----------------------------------------
> 20160616T185403.876777Z noaaportIngester[1656] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.662962S
> Raw Data:
> Octets        2474898832260
> Mean Rate:
> Octets    2.09425e+06/s
> Bits      1.6754e+07/s
> Received frames:
> Number        620913333
> Mean Rate     525.413/s
> Missed frames:
> Number        138559
> %             0.0223104
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      14069438
> Mean Rate     11.9055/s
> Since Start:
> Duration          P13DT16H16M1.662962S
> Raw Data:
> Octets        2474898832260
> Mean Rate:
> Octets    2.09425e+06/s
> Bits      1.6754e+07/s
> Received frames:
> Number        620913333
> Mean Rate     525.413/s
> Missed frames:
> Number        138559
> %             0.0223104
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      14069438
> Mean Rate     11.9055/s
> ----------------------------------------
> 20160616T185403.876820Z noaaportIngester[1655] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.667874S
> Raw Data:
> Octets        56217502582
> Mean Rate:
> Octets    47570.9/s
> Bits      380567/s
> Received frames:
> Number        24507095
> Mean Rate     20.7378/s
> Missed frames:
> Number        2359
> %             0.00962486
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      31776
> Mean Rate     0.0268887/s
> Since Start:
> Duration          P13DT16H16M1.667874S
> Raw Data:
> Octets        56217502582
> Mean Rate:
> Octets    47570.9/s
> Bits      380567/s
> Received frames:
> Number        24507095
> Mean Rate     20.7378/s
> Missed frames:
> Number        2359
> %             0.00962486
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      31776
> Mean Rate     0.0268887/s
> ----------------------------------------
> 20160616T185403.876849Z noaaportIngester[1661] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.662971S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> Since Start:
> Duration          P13DT16H16M1.662971S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> ----------------------------------------
> 20160616T185403.876870Z noaaportIngester[1659] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.661478S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> Since Start:
> Duration          P13DT16H16M1.661478S
> Raw Data:
> Octets        0
> Mean Rate:
> Octets    0/s
> Bits      0/s
> Received frames:
> Number        0
> Mean Rate     0/s
> Missed frames:
> Number        0
> %             -nan
> Full FIFO:
> Number        0
> %             -nan
> Products:
> Inserted      0
> Mean Rate     0/s
> ----------------------------------------
> 20160616T185403.876867Z noaaportIngester[1654] NOTE
> noaaportIngester.c:754:reportStats()
> ----------------------------------------
> Ingestion Statistics:
> Since Previous Report (or Start):
> Duration          P13DT16H16M1.658771S
> Raw Data:
> Octets        893238258738
> Mean Rate:
> Octets    755853/s
> Bits      6.04683e+06/s
> Received frames:
> Number        268717744
> Mean Rate     227.387/s
> Missed frames:
> Number        79186
> %             0.0294594
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      50676084
> Mean Rate     42.8818/s
> Since Start:
> Duration          P13DT16H16M1.658771S
> Raw Data:
> Octets        893238258738
> Mean Rate:
> Octets    755853/s
> Bits      6.04683e+06/s
> Received frames:
> Number        268717744
> Mean Rate     227.387/s
> Missed frames:
> Number        79186
> %             0.0294594
> Full FIFO:
> Number        0
> %             0
> Products:
> Inserted      50676084
> Mean Rate     42.8818/s
> ----------------------------------------
> 20160616T185403.887878Z atlas.cod.edu(feed)[80064] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to atlas.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.887884Z climate.cod.edu(feed)[88804] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to climate.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.903885Z ldmd[1652] NOTE ldmd.c:185:cleanup() Exiting
> 20160616T185403.903961Z ldmd[1652] NOTE ldmd.c:256:cleanup() Terminating
> process group
> 20160616T185403.904040Z weather.cod.edu(feed)[78167] NOTE
> error.c:236:err_log() Couldn't flush connection; flushConnection() failure
> to weather.cod.edu: RPC: Unable to receive; errno = Bad file descriptor
> 20160616T185403.907932Z cdstats.cod.edu(feed)[49285] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.915906Z atlas.cod.edu(feed)[1895] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.975903Z wxsandbox2.cod.edu(feed)[5377] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.979950Z climate.cod.edu(feed)[119425] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.983914Z climate.cod.edu(feed)[38263] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.987934Z wxsandbox1.cod.edu(feed)[67005] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185403.995878Z rtstats[1664] NOTE rtstats.c:134:cleanup() Exiting
> 20160616T185404.100965Z climate.cod.edu(feed)[88804] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.101386Z ldmd[1652] NOTE ldmd.c:168:reap() child 88804
> exited with status 6
> 20160616T185404.667889Z atlas.cod.edu(feed)[80064] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.668387Z ldmd[1652] NOTE ldmd.c:168:reap() child 80064
> exited with status 6
> 20160616T185404.922282Z weather.cod.edu(feed)[78167] NOTE
> ldmd.c:185:cleanup() Exiting
> 20160616T185404.922685Z ldmd[1652] NOTE ldmd.c:168:reap() child 78167
> exited with status 6
> 20160616T191002.843715Z pqcheck[91525] NOTE pqcheck.c:150:main() Starting
> Up (91402)
> 20160616T191002.843776Z pqcheck[91525] ERROR pqcheck.c:202:main()
> pq_get_write_count() failure: /dev/shm/ldm.pq: No such file or directory
> 20160616T191002.843785Z pqcheck[91525] NOTE pqcheck.c:71:cleanup() Exiting
> 
> 
> 
> This is what showed in /var/log/syslog:
> 
> Jun 16 18:54:01 noaaport2 kernel: [1183315.725767] traps:
> noaaportIngeste[1689] general protection ip:7f1c610a0c84 sp:7f17b1b75dc0
> error:0 in libpthread-2.21.so[7f1c61097000+18000]
> Jun 16 18:54:07 noaaport2 systemd[1]: Stopping User Manager for UID
> 1000...
> Jun 16 18:54:07 noaaport2 systemd[966]: Reached target Shutdown.
> Jun 16 18:54:07 noaaport2 systemd[966]: Starting Exit the Session...
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Default.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Basic System.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Sockets.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Paths.
> Jun 16 18:54:07 noaaport2 systemd[966]: Stopped target Timers.
> Jun 16 18:54:07 noaaport2 systemd[966]: Received SIGRTMIN+24 from PID
> 90148
> (kill).
> Jun 16 18:54:07 noaaport2 systemd[1]: Stopped User Manager for UID 1000.
> Jun 16 18:54:07 noaaport2 systemd[1]: Removed slice user-1000.slice.
> 
> Jun 16 18:55:01 noaaport2 CRON[90168]: (ldm) CMD (/bin/bash -l -c
> '/home/ldm/bin/ldmadmin addmetrics')
> Jun 16 18:55:02 noaaport2 postfix/pickup[87999]: 81BB51D7A: uid=1000
> from=<ldm>
> Jun 16 18:55:02 noaaport2 postfix/cleanup[90249]: 81BB51D7A: message-id=<
> address@hidden>
> Jun 16 18:55:02 noaaport2 postfix/qmgr[1111]: 81BB51D7A: from=<
> address@hidden>, size=708, nrcpt=1 (queue active)
> Jun 16 18:55:02 noaaport2 postfix/local[90251]: 81BB51D7A: to=<
> address@hidden>, orig_to=<ldm>, relay=local, delay=0.12,
> delays=0.08/0/0/0.03, dsn=2.0.0, status=sent (delivered to mailbox)
> Jun 16 18:55:02 noaaport2 postfix/qmgr[1111]: 81BB51D7A: removed
> 
> So, in the process of all of this happening, the ldm.pq file got erased.
> I have no idea what any of this means, but hopefully you can piece this
> together. No core file dumped.

> And, sorry...I hit "send" too quickly: This all happened on
> noaaport2.cod.edu. Noaaport1.cod.edu was just fine and
> kept on ticking. Both receive the Novra broadcast identically
> via a network switch; I can access their Novra box via
> noaaport1 or 2. Again, Noaaport1.cod.edu had no issues and
> just kept humming right along.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: URY-771320
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.