Re: [ldm-users] NWS official statement on the data outage of 2/13/2017

I generally keep my comments off of this list, but the official statement is not accurate. As you know, Gilbert, WRIP (NOAA Weather Radio) was only partially up and that was WFO dependent - if an WFO fired up their CRS, they could manually record hourly forecasts and observations.

Likewise, checking the tombstones via local forecast on weather.gov resulted in "N/A" for most zip codes I tried during the outage.

From an IT side, I would say this particular event should help illustrate the foolhardy paradigm of "putting all your eggs in one basket." HPC, NCEP, EMC . . . all were hit with this outage. NWWS, EMWIN, and SBN were all down. A primary and secondary router paradigm in the same physical location may save lots of money, but is an incredible liability for data distribution.

I have always advocated that a live secondary should be placed geographically distant from the primary uplink and NCF . . . and KC has legacy and national networking infrastructure already in place to make it a likely candidate. Or Boulder, at ESRL.

Not that my comments will merit any action, but if pointing out what seems obvious to me gains traction, that would be great. I would even start filling out my TPS cover sheets for all my memos, and coming in on over the weekend to work. ("Office Space" reference)

Stonie

On 02/14/2017 03:17 PM, Gilbert Sebenste wrote:
Here it is:



NWS Statement on Cause of Outage on Feb. 13
Feb. 14, 2017

The National Weather Service experienced a failure of its AWIPS Network
Control Facility communications network at 2:08 p.m. EST Feb 13.  The
outage, lasting two hours, 36 minutes, prevented us from fully
distributing forecasts and warnings. During the outage, the public was
able to access forecasts, watches and warnings by NOAA Weather Radio and
the social media accounts of their local forecast office.

Technicians quickly determined the cause of the problem was the
simultaneous failure of two core communications routers - primary and
backup - for the control facility due to a power problem. The routers
were replaced and the system was restored to full service. We are still
investigating what caused the power outage.

The AWIPS communications system is a very reliable configuration and
this is the first time both routers failed simultaneously.

We are implementing additional communication pathways to the backup
Network Control Facility to ensure that problems encountered in
switching operations to this backup facility will not recur.

---



*Gilbert Sebenste*

Staff Meteorologist



Environmental Health and Safety

Labs for Wellness 154 | DeKalb, Illinois 60115

815-753-5492

_gilbert@xxxxxxx <mailto:gilbert@xxxxxxx>_

http://weather.admin.niu.edu <http://weather.admin.niu.edu/>

Everyone. Home. Safely.



NIU







_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/




  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: