[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020808: Fw:dcgrib2 is hanging...



Robert,

NCEP posted ADMIN messages about trouble with files being distributed
and comms problems- presumably they are choking on the same thing:

NOUS42 KWNO 072354  
ADMNFD  
  
SPECIAL NCEP DISCUSSION  
CENTRAL OPERATIONS/NCEP/NWS/WASHINGTON DC  
072340 UTC WED AUG 7 2002   
  
072340Z...PLEASE NOTE THE FOLLOWING ITEMS..  
  
1..THERE APPEARS TO BE SOME TYPE OF COMMS  
PROBLEM AT OR BETWEEN NCF/AWIPS AND TOC/GATEWAY  
THAT IS CAUSING SOME NCEP 18Z ETA AND 18Z AVN MODEL   
DATA AND RUC2 DATA FROM REACHING NCF FOR DISTRIBUTION  
TO AWIPS USERS..NCF AND TOC ARE TROUBLESHOOTING..  
SORRY FOR THE INCONVENIENCE..  

etc....

504   
NOUS72 KNCF 081646  
ADMNCF  
THE NWSTG TECH CONTROL HAS INFORMED THE NCF THAT THEY CONTINUE  
TO EXPERIENCE PROBLEMS WITH TRANSFERING MODEL DATA TO THE NCF.   
BOTH NWSTG TECH CONTROL AND THE NCF ARE CURRENTLY WORKING TO   
RESOLVE THE PROBLEM.  THE PROBLEM IS VERY COMPLEX AND AT THIS   
TIME THERE IS NO ESTIMATED TIME WHEN NORMAL SERVICE WILL BE   
RESTORED.  THE RECEIPT OF MODEL DATA WILL CONTINUE TO BE   
IMPACTED UNTIL THE PROBLEM IS SOLVED.  AS SOON AS THE PROBLEMS  
ARE CORRECTED THE NCF WILL NOTIFY ALL SITES.  THANK YOU FOR   
YOUR PATIENCE AND SUPPORT.  
.  
NCF  


My guess is that the code is probably stuck in an infinite loop.
I didn't observe a problem here....but if it does in the future,
a "kill -6" to the process will dump a core that can be used to
see where the code is using dbx. Trouble is, the decoder is
probably doing exactly what the data is telling it to do.

Steve Chiswell
Unidata User Support



>From: "Robert Mullenax" <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200208081711.g78HBHK19131

>dcgrib2 brought my machines down..I had load averages
>of 50 for single CPU machines.  What happened here?
>
>I also noticed it at Universal with our SDI-LDM box.
>
>Thanks,
>Robert
>
>
>----- Original Message -----
>Date: Thu  August 08, 2002  11:39 AM
>From: Daryl Herzmann <address@hidden>
>To: address@hidden
>Subject: dcgrib2 is hanging... 
>
>Hi LDMers,
>
>With the recent model data problems, I notice that the dcgrib2 process is 
>hanging on machines ingesting model data.  You may want to check your LDM 
>ingestor and see if you have hung processes...
>
>On the four machines I have ingesting model data, all four currently have 
>hung dcgrib2 processes...
>
>Daryl
>
>-- 
>/**
> * Daryl Herzmann (address@hidden)
> * Program Assistant -- Iowa Environmental Mesonet
> * http://mesonet.agron.iastate.edu
> */
>
>