NOTE: The decoders
mailing list is no longer active. The list archives are made available for historical reasons.
On Fri, 20 May 2005, Unidata Support wrote: > > ------- Forwarded Message > > >To: <support@xxxxxxxxxxxxxxxx> > >From: "Hanson, Kurt" <khanson@xxxxxxx> > >Subject: FW: Unidata decoders - syn2nc bug re Unicode > >Organization: WSI > >Keywords: 200505201415.j4KEFOP3013416 netCDF decoders syn2nc > > This is a multi-part message in MIME format. > > ------_=_NextPart_001_01C55D46.5A42735C > Content-Type: multipart/alternative; > boundary="----_=_NextPart_002_01C55D46.5A42735C" > > > ------_=_NextPart_002_01C55D46.5A42735C > Content-Type: text/plain; > charset="Windows-1252" > Content-Transfer-Encoding: quoted-printable > > I haven't received a response to this. Does that mean its still in a i consider this very important, been working on a new release but other projects have deverted time. thanks. robb... > queue or that the issue is not considered important? > > Just curious -- > > Kurt Hanson > WSI Corporation > 400 Minuteman Rd. > Andover, MA 01810 > (978).983.6549 > www.wsi.com > > > > -----Original Message----- > > From: Hanson, Kurt > > Sent: Wednesday, April 20, 2005 1:20 PM > > To: 'support@xxxxxxxxxxxxxxxx' > > Subject: Unidata decoders - syn2nc bug re Unicode > > > > I imagine this will end up in Rob Kambic's inbox... if so, hello again > Rob. > > > > We've been occasionally experiencing issues with syn2nc. The problem > is that once every week or two, the syn2nc log will suddenly begin > filling with messages about Unicode: > > > > Malformed UTF-8 character (unexpected continuation byte 0x8e, with no > preceding start byte) in index at /dicast2-papp/DICAST/tmp/syn2nc_308 > line 618, <STDIN> chunk 1. > > Malformed UTF-8 character (unexpected non-continuation byte 0x2a, > immediately after start byte 0xf6) in index at > /dicast2-papp/DICAST/tmp/syn2nc_308 line 618, <STDIN> chunk 1. > > > > The log file grows without bound until finally the disk partition > fills, hobbling the entire system. > > > > I think I understand the problem and have a fix. The problem appears > to be due to some garbage characters in several of the synoptic messages > from today for a single site -- FBSK in Botswana. (I imagine that all of > the problems we've ever seen are from this site.) > > > > The issue is that since 5.8.0, Perl has some automatic support for > handling Unicode characters. Once Perl sees a character outside of the > range [0,127], it assumes that the text data is Unicode rather than > ASCII. Since the garbage characters from today's FBSK data did not > conform to Unicode rules, Perl itself (rather than syn2nc) generated the > messages. > > > > So the magic fix I installed is to put a "no encoding;" line (pragma) > into the syn2nc script. This ensures that Perl doesn't try to guess what > sort of character set the text is in -- it just passes the data up to > the application level in raw form. That's what we need with syn2nc. > > > > Scope: > > * We experience this problem on a Linux RedHat Enterprise 3.0 Athlon > system running Perl 5.8.0. > > * We do not experience it on a Solaris 8 system running Perl 5.8.0. > > > > Testing: > > When I pipe the attached synoptic file > synoptic.20050420.1200.asc.FBSK_210 into the pristine syn2nc on the > Linux system, the log file grows without bound. When I pipe it into my > patched version, the log file size remains stable, and the file never > gets any Unicode error messages. > > > > Discussion: > > Thinking beyond the low-level Perl issue, I'm not sure what syn2nc > should do when it encounters the garbage characters... Nor am I sure > what it actually does -- I'd dig into the script itself to find that out > but I'm running short on time. What do you think? > > > > Also, I'd be curious to hear whether you see the garbage characters in > your FBSK synoptics for today. Its possible but unlikely that the > garbage is not due to the FBSK sensor itself but due to some > communications issue that is WSI-specific. > > > > I'm attaching a few things: > > * syn2nc.new -- an updated version of syn2nc from the 3.0.9 version of > the decoders package. > > * syn2nc.patch -- a diff of my version vs the pristine 3.0.9 > > * synoptic.20050420.1200.asc.FBSK_210 -- message #210 from today's > synoptic feed, containing garbage characters 0x8e and others in line 6 > of the file. > > > > Relevant Perl references: > > * Unicode intro: http://perldoc.perl.org/perluniintro.html > > * encoding pragma: http://perldoc.perl.org/encoding.html > > > > Whew. I think that's about everything! Feel free to contact me. > > > > Kurt Hanson > > Senior Software Engineer & Scientific Analyst > > WSI Corporation > > 400 Minuteman Rd. > > Andover, MA 01810 > > my phone: 978.983.6549 > > www.wsi.com > > > > > <<syn2nc.new>> > > <<syn2nc.patch>> > > > <<synoptic.20050420.1200.asc.FBSK_210>> > > > > > > > ============================================================================== Robb Kambic Unidata Program Center Software Engineer III Univ. Corp for Atmospheric Research rkambic@xxxxxxxxxxxxxxxx WWW: http://www.unidata.ucar.edu/ ==============================================================================
decoders
archives: