Re: 20050520: Unidata decoders - syn2nc bug re Unicode (cont.)

NOTE: The decoders mailing list is no longer active. The list archives are made available for historical reasons.

  • To: "Hanson, Kurt" <khanson@xxxxxxx>
  • Subject: Re: 20050520: Unidata decoders - syn2nc bug re Unicode (cont.)
  • From: Robb Kambic <rkambic@xxxxxxxxxxxxxxxx>
  • Date: Fri, 20 May 2005 09:12:16 -0600 (MDT)
On Fri, 20 May 2005, Unidata Support wrote:

>
> ------- Forwarded Message
>
> >To: <support@xxxxxxxxxxxxxxxx>
> >From: "Hanson, Kurt" <khanson@xxxxxxx>
> >Subject: FW: Unidata decoders - syn2nc bug re Unicode
> >Organization: WSI
> >Keywords: 200505201415.j4KEFOP3013416 netCDF decoders syn2nc
>
> This is a multi-part message in MIME format.
>
> ------_=_NextPart_001_01C55D46.5A42735C
> Content-Type: multipart/alternative;
>       boundary="----_=_NextPart_002_01C55D46.5A42735C"
>
>
> ------_=_NextPart_002_01C55D46.5A42735C
> Content-Type: text/plain;
>       charset="Windows-1252"
> Content-Transfer-Encoding: quoted-printable
>
> I haven't received a response to this. Does that mean its still in a

i consider this very important, been working on a new release but other
projects have deverted time.

thanks.
robb...

> queue or that the issue is not considered important?
>
> Just curious  --
>
> Kurt Hanson
> WSI Corporation
> 400 Minuteman Rd.
> Andover, MA 01810
> (978).983.6549
> www.wsi.com
>
>
> >  -----Original Message-----
> > From:       Hanson, Kurt 
> > Sent:       Wednesday, April 20, 2005 1:20 PM
> > To: 'support@xxxxxxxxxxxxxxxx'
> > Subject:    Unidata decoders - syn2nc bug re Unicode
> >
> > I imagine this will end up in Rob Kambic's inbox... if so, hello again
> Rob.
> >
> > We've been occasionally experiencing issues with syn2nc. The problem
> is that once every week or two, the syn2nc log will suddenly begin
> filling with messages about Unicode:
> >
> > Malformed UTF-8 character (unexpected continuation byte 0x8e, with no
> preceding start byte) in index at /dicast2-papp/DICAST/tmp/syn2nc_308
> line 618, <STDIN> chunk 1.
> > Malformed UTF-8 character (unexpected non-continuation byte 0x2a,
> immediately after start byte 0xf6) in index at
> /dicast2-papp/DICAST/tmp/syn2nc_308 line 618, <STDIN> chunk 1.
> >
> > The log file grows without bound until finally the disk partition
> fills, hobbling the entire system.
> >
> > I think I understand the problem and have a fix. The problem appears
> to be due to some garbage characters in several of the synoptic messages
> from today for a single site -- FBSK in Botswana. (I imagine that all of
> the problems we've ever seen are from this site.)
> >
> > The issue is that since 5.8.0, Perl has some automatic support for
> handling Unicode characters. Once Perl sees a character outside of the
> range [0,127], it assumes that the text data is Unicode rather than
> ASCII. Since the garbage characters from today's FBSK data did not
> conform to Unicode rules, Perl itself (rather than syn2nc) generated the
> messages.
> >
> > So the magic fix I installed is to put a "no encoding;" line (pragma)
> into the syn2nc script. This ensures that Perl doesn't try to guess what
> sort of character set the text is in -- it just passes the data up to
> the application level in raw form. That's what we need with syn2nc.
> >
> > Scope:
> > * We experience this problem on a Linux RedHat Enterprise 3.0 Athlon
> system running Perl 5.8.0.
> > * We do not experience it on a Solaris 8 system running Perl 5.8.0.
> >
> > Testing:
> > When I pipe the attached synoptic file
> synoptic.20050420.1200.asc.FBSK_210 into the pristine syn2nc on the
> Linux system, the log file grows without bound. When I pipe it into my
> patched version, the log file size remains stable, and the file never
> gets any Unicode error messages.
> >
> > Discussion:
> > Thinking beyond the low-level Perl issue, I'm not sure what syn2nc
> should do when it encounters the garbage characters... Nor am I sure
> what it actually does -- I'd dig into the script itself to find that out
> but I'm running short on time. What do you think?
> >
> > Also, I'd be curious to hear whether you see the garbage characters in
> your FBSK synoptics for today. Its possible but unlikely that the
> garbage is not due to the FBSK sensor itself but due to some
> communications issue that is WSI-specific.
> >
> > I'm attaching a few things:
> > * syn2nc.new -- an updated version of syn2nc from the 3.0.9 version of
> the decoders package.
> > * syn2nc.patch -- a diff of my version vs the pristine 3.0.9
> > * synoptic.20050420.1200.asc.FBSK_210 -- message #210 from today's
> synoptic feed, containing garbage characters 0x8e and others in line 6
> of the file.
> >
> > Relevant Perl references:
> > * Unicode intro: http://perldoc.perl.org/perluniintro.html
> > * encoding pragma: http://perldoc.perl.org/encoding.html
> >
> > Whew. I think that's about everything! Feel free to contact me.
> >
> > Kurt Hanson
> > Senior Software Engineer & Scientific Analyst
> > WSI Corporation
> > 400 Minuteman Rd.
> > Andover, MA 01810
> > my phone: 978.983.6549
> > www.wsi.com
> >
> > >  <<syn2nc.new>> > >  <<syn2nc.patch>> > > 
> <<synoptic.20050420.1200.asc.FBSK_210>>
> >
> >
> >
>

==============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
rkambic@xxxxxxxxxxxxxxxx                   WWW: http://www.unidata.ucar.edu/
==============================================================================


  • 2005 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the decoders archives: