NLDM Status Report

Anne Wilson and Mike Linck

 January 27, 2004

NLDM Development

Adding in new feed types

Code that extracts products from the LDM product queue and posts them to a news server was expanded to handle multiple data feeds.  Now there is one instantiation of this program per data feed being relayed.  Also, the statistics package and the statistics display software were expanded to handle the same.  Adding new feed types from an LDM is now a simple process: NLDM statistics V0.0.1 uses a great variety of scripts, programs, files, cron jobs, signals, newsgroup naming conventions, etc.    Besides the future requirement to provide documentation for users, it became necessary to write comprehensive  statistics documentation  simply to keep track of all the pieces.

Encoding Issues:

Background: In order to relay via NNTP, binary articles (such as data) must be encoded to meet NNTP format requirements.  Clearly this is a very costly process due to the volume of data to be relayed.  Some time ago a 'homegrown' encoding was designed that would minimally process each byte of data with minimal enlargement in size.  Note that this process is comparable in overhead to computing an MD5 checksum, which the LDM does currently.

This encoding had a bug that was fixed this fall.  The problem turned out to related to the buffering of the encoded data, not the encoding itself.

Also this past fall it was noticed that while encoded size for the CRAFT data fell mostly in the range of 1% - 7% increase, some, but not all, encoded CONDUIT products had increases in size of up to 100%.  It was determined that this occurred on CONDUIT products having nearly all null characters, which is one of the few characters that must be encoded.  Thus, various simple encoding modifications were tested in attempt to mitigate this problem.    Testing occurred over both the CONDUIT and the CRAFT feed types so that an improvement in one would not the other worse.  It was found that adding 207 to every byte before encoding gave the smallest number of encoded characters for both feed types, giving an average number of encoded characters of 1.17% for CONDUIT data and 1.66% for CRAFT data.

Building a Distribution Package

A distribution package has been built using automake and autoconf and configured by configure.  (These are GNU software development tools.)  Now, however, NLDM can be built under both Linux and Solaris (and it even runs too), although a few manual tasks still remain to be automated at this point.

Since NLDM is a package that sits on top of INN, it is intended that NLDM code will exist in a subdirectory of INN.  Users will get INN from its distributor* and get NLDM from Unidata.  Each is to be built and installed separately.

(* INN is distributed by ISC, recently renamed from the Internet Software Consortium to be Internet Systems Consortium, Inc.)
 

Prototype network

INN and NLDM are running on three UNIX machines in Boulder and one UNIX machine in Washington, D.C.  All sites have been upgraded to run the latest version of INN (2.4.1, released in December) and NLDM (0.0.1).  Non-Unidata sites have not yet been incorporated in the network, but that is expected to happen soon.

Currently both CONDUIT and CRAFT feed types are being relayed.

The CRAFT data is being posted to newsgroups based on station ID, e.g. unidata.binaries.craft.KABC.  So far there are over 100 craft newsgroups, one per station for each station reporting.
 

JNLDM

JNLDM testing continued on both UNIX and Windows platforms.  However, the Windows testing slowed to a halt due to problems that appeared to be in a network card.  Since this testing was deemed lower priority than other tasks, it has been postponed.

The JNLDM GUI has undergone significant conceptual and visual modifications, along with a complete rewrite of the code.  The new GUI is more attractive and more intuitive.  It is now broken down into four distinct categories:  Console, Data Transfer, Logging, and DataFlow.  There is one screen devoted to each category.  Wizards exist to help new users find their way around, and an extensive amount of error checking is performed to make the GUI as safe to use as possible.  Because the GUI only exists to modify the configuration file, and does not interact with the program directly - except to display possible warning and error message generated by the Logger - its code has been completely separated from the code for the back end of the program, making the GUI easier to modify and maintain.

JNLDM file caching was completed.  Upon shut down articles are now being stored to disk  and are read back into memory when the program restarts.

JNLDM was integrated with the IDV so that the IDV could display level II radar data received by JNLDM.  The design of this initial integration effort was made intentionally simple.  A simpler instantiation mechanism was provided so that JNLDM could be a class, called RadarControl, to be instantiated by the IDV.  The RadarControl object maintains a list of stations of interest and notifies the IDV when products arrive from any of these stations.  Methods for that class include addStation and removeStation.

(Since it is not yet possible to modify a JNLDM subscription, the underlying implementation of the above has JNLDM getting all level II radar data and only notifying the IDV of the arrival of products of interest.  Yes, this is inefficient.)

Also, a new action, IDVAction was added to JNLDM to allow it to notify the IDV of new products.   In order to provide the data in its original raw format,  it was also necessary to write Java code to concatenate level II radar products into a single volume and to unbzip2 the data.  These tasks are part of the IDVAction.

This simple integration method worked - the IDV was able to display level II radar data at its time of arrival.

Additionally, a scour utility was written in Java to keep disk space from filling up.

Significant effort has been put into JNLDM documentation in the form of textual descriptions for developers, javadocs, and flow diagrams.  A user's tutorial is currently under development.
 

MISCELLANEOUS

Profiled Memory Usage of C Code

In anticipation of creating a distribution that will run on non-Unidata sites, the NLDM C code was profiled for memory usage.    It was confirmed that there were no memory leaks in the various programs.  At the same time, very large but unused data arrays were discovered and removed.
 

Testing Use of Control Messages

Control messages are specially formatted messages used to distribute administrative  requests to the network, such as changes to the newsgroup hierarchy.   Requested actions include:


Control messages are handled by an INN program called controlchan.   controlchan allows a local site to handle the control message in a variety of ways:


This feature was tested by creating messages to notify a site of both new groups and groups to be deleted.   The appropriate file of known newsgroups was indeed successfully updated.

This mechanism may also be useful for sending or pulling data, but this remains to be tested.
 

Testing Changes to Subscriptions at Remote Sites

GUP (Group Updating Program) is free software to update subscriptions at remote sites.  We have been testing GUP for the purpose of being able to update a subscription list that exists at a remote site.  The results have been mixed.

Consider a scenario where site A relays articles to site B.  Site B wishes to change its subscription list that resides at site A.

First it is necessary to install a new user 'gup' at site A.    Also there will be a small file structure on site A to handle the subscription changes, structured as follows:
    sites/
        siteB/
            groups  (current subscription list)
            groups.old
            exclude (groups site B is not allowed to receive)
            log (of operations performed for site B)
        siteX/
        siteY/
        siteZ/

The gup account at site A must have .forward file that includes a command to pipe the email to the gup program.   When a user from site B sends a request for a subscription change to gup@siteA,  gup will first check the active file of user 'news' to ensure that news knows about the requested newsgroups.  It will then update the appropriate groups file in the subdirectory for site B.

Testing showed the GUP indeed performed the above.  However, that alone is not sufficient to change the appropriate configuration file, thus a critical step is missing.    It is not yet clear how this could be handled.
 

Running as Another User Besides User News

One installation of INN is running as user 'nldm' rather than user 'news'.  This confirms that INN can be installed to use any user name and any port number.
 

NLDM Presentation to the Community

The current state of this work was presented at AMS.