[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020702: ADDE compression



Hi Dave-

This is kind of a followup to the recent discussion on using
port 112 to support GZIP compression.

We've been looking at using some Java code we found that will
handle a compressed ADDE stream in addition to a GZIP'd
stream in the edu.wisc.ssec.mcidas.adde.AddeURLConnection
class (AUC).  While looking at this, I was looking at restructuring
AUC so that it would attempt to use a compressed port
and if it couldn't, then it would try to make the connection
using the uncompressed port.  I was trying to figure out
what I need to send to the server to make the connection and
noticed that part of the protocol is to send along the port
number.

Tom Y and I were looking at mcserv.cp and saw that the port
number that is sent along as part of the protocol is what is
used to determine whether to compress or not compress, not
the actual port that the connection is made on.  This means
that the ADDE protocol could use only one port for both 
compressed transfers and uncompressed transfers.  I confirmed
this my using AUC to make a connection on port 503, but send
a different port as part of the protocol.  In that case, the
server sent back uncompressed data.  Conversely, if I connect
to port 500, but tell mcserv that I'm on 503, it will send
back compressed data.  This would mean that ADDE could move
to port 112 to support any type of compression (including
none) and that the port number sent along as part of the
protocol could actually be used to specify the type of
compression (e.g. 500 = none; 503 = compress; 112 = gzip; 2222 = png,
etc).  So, the question came up of why is ADDE using 2 ports
now?

If you did move toward one port, it would be good to have
some kind of handshake as well that would allow the client
to get acknowledgement from the server that the connection
is established and so that it can verify the compression
algorithm used.  (i.e. - client sends "hello, I want to use
GZIP for this transaction", server sends back something that
is GZIP'ed so the client can try to ungzip it).  Right now,
this doesn't happend until the whole request is sent across
and the server starts streaming data back.

So, could you confirm that indeed only one port is needed and
if so, let us know what you think about the changes I suggest
above?

Thanks.

Don

re: why is ADDE using 2 ports when one would suffice

>Good question....that only John Benson could answer! Though he is
>retired, he still answers email. If you post it on the McIDAS Users list,
>he is usually the first person to respond.

re: implement handshake

>Interesting idea....so if the server didn't understand the compression
>requested, it would send a message to the client stating that it was
>sending it back uncompressed?

>Unfortunately, we're strapped for time & people to take this on right
>now. I'm only 25% on McIDAS. But, now that TomY won't be changing XCD
>anymore [wink!], maybe he would like to take a stab at it.

>dave

Date: Mon, 08 Jul 2002 15:40:02 -0600
From: Don Murray <address@hidden>
Organization: Unidata/UCAR
To: address@hidden
Subject: ADDE ports/compression

Hi All-

I've been working on updating the Java ADDE client distributed
with VisAD (http://www.ssec.wisc.edu/~billh/visad.html) to support
compressed transfers.  Unfortunately, core Java only supports
GZIP compression, not Unix compress (LZW).  However, we have
found a class that will uncompress a compressed ADDE stream
and are testing that now.

In the process, I'm having to delve into the inner workings
of mcserv.cp to figure out what I need to send, etc.  I
haven't found any documentation on the protocol of what a
client (not a McIDAS program, but a "raw" ADDE client adhering
to the protocol) needs to send across.  Since I didn't write the 
original Java client, I'm just going by what is being sent now
which is:

    /*
     Now start pumping data to the server.  The sequence is:

        -  ADDE version (1 for now)
        -  Server IP and Port number  (latter used to determine compression)
        -  Service Type (AGET, ADIR, etc)
        -  Server IP and Port number (again)
        -  Client address
        -  User, project, password
        -  Service Type (again)
        -  Actual request

     */

Is there any documentation on what needs to be sent to initiate a
transaction?  For example, do I really have to send over the Server IP
and Service Type twice?

Secondly, it looks like mcserv uses the port number that gets sent as
part of the protocol to determine whether or not to use compression.
Since mcserv is started from inetd, it doesn't really know the actual
port that it is reading from/writing to so it has to rely on what the
port number sent as part of the initial request.  So, I can make a
connection to port 500, send across 503 in the port number part of the
protocol and get compression on port 500.  Conversely, I can connect on
port 503 and tell mcserv that I'm on port 500 and get no compression on
port 503.  So, the question came up of why does McIDAS need to use 2
ports, when it appears that you can support compressed and
non-compressed transfers on one port.  Since each request is a new
connection, it also means that you could have some connections use
compression and others not if McIDAS would support a global keyword for
compression instead of an environment variable.

The other problem I've run into is determining whether what I'm getting
back is compressed or not before trying to read a stream of data.  It
would be nice if there was a way to have a protocol handshake that
would send a minimal amount of data to determine if compression was on
or not.  If the server didn't support compression, then it could tell
the client not to expect it before sending a lot of data back.  Is
there a minimal request that can be made to determine whether
compression is enabled on the server?

It would also be helpful if McIDAS supported GZIP compression instead
of compress, since GZIP can be more compact and is supported in core
Java.  If only one port is needed, the port number sent along as part
of the protocol could actually be used to specify the type of
compression (e.g. 500 = none; 503 = compress; 112 = gzip; 2222 = png,
etc).  The handshake mentioned above could be used to to get
acknowledgement from the server that the connection is established and
so that it can verify the compression algorithm used.  (i.e. - client
sends "hello, I want to use GZIP for this transaction", server sends
back something that is GZIP'ed (maybe just the port number compressed)
so the client can try to ungzip it).  Right now, this doesn't happend
until the whole request is sent across and the server starts streaming
data back.  This would be a nice enhancement for a future release (ADDE
version 2?).

Any help on the format of the request protocol, why 2 ports are
used/needed now and input on supporting various compression algorithms
would be appreciated.

Don

From: Don Murray <address@hidden>
Date: Tue, 09 Jul 2002 07:59:54 -0600
To: John Benson <address@hidden>
Subject: Re: ADDE ports/compression

Hi John-

Thanks for your reply.

John Benson wrote:
> 
>      The program can too tell which port it is using.
> It queries the socket and gets the address as part of
> a structure, something to do with inet_addr.  I'd have
> to look it up.

But it doesn't seem like mcserv uses this, it uses what is sent
from the client as part of the protocol to determine compression.
The point is that the actual port used in not important, what
is important is what port (i.e. compression) the client tells the 
server it wants to use.
 
>      The data stream sent out compressed is _exactly the same_
> as the data stream sent out uncompressed, except it's piped
> through compress on the transmitting end and through uncompress
> on the receiving end, so everything which is aware of the protocol
> is unaware of the compression, and vice versa.  The additional
> forking is due to a pipe-to-socket adapter which needs to guard
> every read() or write() with select(), because on some systems
> the compress process sets its pipe to non-blocking, and the effect
> reflects to the other end of the pipe.  The particular port
> sent out in the request is not used to determine on the recieving
> end whether to uncompress:  in fact, if it hadn't already determined
> that, it couldn't read the request block at all, since the whole
> blinking thing is compressed.

Yes, I understand that the streams are the same except for compression.
On the client side, mcserv determines whether it should start up
uncompress based on the environment variable.  

>      The fact that the first few words of the transaction are repeated
> is because the first time they are eaten by mcserv and used to
> determine which transaction to start, and the second time they are
> eaten by the transaction itself, so it has the complete request
> text available to it.

Okay, thanks for that clarification.  Is there any documentation 
on what needs to be sent (other than reading the code)?
 
>      We would have loved to use gzip rather than compress, but since
> it is not part of every UNIX distribution, we would have had to tell
> users to install it, or we would have had to package it in with McIDAS,
> both of which would have caused logistics problems.

At this time, McIDAS packages two copies of zlib (one regular and
one with the HDF-EOS stuff).  Since it's already there, then why not
use it?
 
>      In the future, I would love to see a time when we simply always
> compressed.  Modern CPU speed had outstripped comm speed so much that
> I have seen examples of compression being faster even for systems
> sitting side by side on a local LAN.  But back when this was first
> developed, cycles were more precious, hence the need to choose.

I agree.  For our Java work, we will probably always turn compression
on. 

Thanks again for the input.

Don

Date: Tue, 9 Jul 2002 09:18:58 -0500 (CDT)
From: John Benson <address@hidden>
To: Don Murray <address@hidden>
Subject: Re: ADDE ports/compression


>     Ah, sorry.  You're right: it's not the REQUEST that's
>compressed, it's the RESPONSE.  And since the request is
>in the clear, the port number encoded in it _can_ be used
>as the trigger to request a compressed response.  The request
>itself is short.

>    Duh.

>    --johnb

Date: Tue, 09 Jul 2002 07:53:24 -0400
From: "Brian Callicott" <address@hidden>
Organization: Northrop Grumman
To: Don Murray <address@hidden>
Subject: [Fwd: ADDE ports/compression]

Don,

I really can't answer your question, but have done some non-streaming
compression studies here in SATEPS (NOAA/GOES, Camp Springs, MD) and
have found the open source program "bzip2" to be a better compressor of
GOES Area Format GOES Imagery (all bytes for Bands 1-5 used for tests,
with VIS data taking the major portion).  More info can be found at:
http://sources.redhat.com/bzip2/


Brian