Analysis

Conklin Systems prepared the following analysis comparing technologies for transferring data over modems in late 1990 for ASI on the State of Michigan's Michigan Opportunity System project. It was incorporated as an appendix into the project documentation and became part of the public record.

The Michigan Opportunity System was an effort to create a statewide, multiple-service system that would help track and accelerate the process of distributing aid from the 70+ State agencies that helped people in need.

By sharing common data such as contact information through a distributed database managed by a server pool spread throughout the state, hours could be shaved off of an applicants processing time. On off hours, this meant transferring large amounts of data from thousands of sites, some with a staff of 40, some with a staff of one. Optimizing the transport became an issue.

At the time, the MNP error correcting modems were a new technology. V.42 bis error correction, now standard, wasn't available to the public as it incorporated patented technology of IBM's which IBM later put in the public domain to facilitate the CCITT's 14.4K baud modem standard.

This reported has been reproduced as accurately as HTML allows.

[Author's note: This report analyzed Procomm's implementation of Kermit. The original Kermit is a much different creature, considerably faster, and still being enhanced and supported. You can find out more about it here.]

December 26, 1990

Data transmission options and performance

One facet of the performance of the MOS will be data transmission performance. In evaluating data transmission performance, a number of transmission options were identified at various levels. Different types of data, software compression schemes, transfer protocols and modem hardware can all affect the rate at which information can be transmitted. Further, combinations of these options may have unique performance characteristics. (figure I)

The majority of data transmission in the MOS is in "batch" mode, which facilitates evaluation of transmission times. The typical batch transmission involves a short connect sequence, the transfer of large amount of data, and a disconnect sequence, with the majority of the time being spent in data transfer at the best possible speed. File transmission timing can accurately measure the maximum transmission rate of a specific data transfer method and thus serves as a high quality estimate of performance in the NOS.

On-line file clearance, which is a short, two-way communication between an MOS workstation and an aggregate, has much different timing characteristics. However, since it is expected to take less than 40 seconds, the total deviation. for various data transfer methods is very small and thus it has been left out of this analysis.

This document begins with an overview of data compression and file transfer technologies. Readers familiar with these may wish to skip to Results of data transmission testing. A summary of test conditions is provided at the end of this document.

C source code Word Perfect Document HIS data files Program EXE file Types of data
No compression ARC ZIP   Software level compression
Xmodem Ymodem Zmodem Kermit Transfer Protocol
Normal Modem MNP Lev 5 data compressing modem     Modem Type

MOS data transfer options

Data transmission issues - overview

The MOS will need to transfer a potentially large amount of data. If that amount of data proves more than a given communications method (modems, leased lines) can reasonably handle, the primary response is to switch to a higher speed communications method. However, for any given speed communications channel, data transmission can be optimized by one of two means:

A transmission protocol is a set of rules for data communications. Different rules are optimal for different situations. For example, some protocols are good at moving large amounts of data over reliable communication lines, where other protocols are optimal for sending data over erratic communication lines. Xmodem, Ymodem, Zmodem and Kermit are standard protocols that are applicable to the MOS communications environment and evaluated here.

Data compression is the technique of encoding information so that it takes up le ss space. By compressing information before sending it over communication lines, less data is actually transferred and communications speed is increased. Data compression schemes can compress data asmuch as 80% (a 100,000 byte file is squeezed into 20,000 bytes) or as little as none at all, depending on the type of data being compressed. When the original data has arrived at its destination, a decompression scheme is used to restore the original information.

Two types of data compression are applicable to the MOS. Software compression uses the host computer to process a file and produce a compressed equivalent. For the MOS, this would mean software on the aggregate that pre-compresses a file. The compressed data is then transmitted over the phone and decompressed by similar software on the end machine. An alternate technology, hardware compression is available in some modems. These modems actually compress the information as they receive it from the computer, send it over the phone, and then decompress the information before sending on to the end machine. Hardware compression schemes eliminate the processing step for the host computer. Two software compression schemes, Arc and Zip, are evaluated here along with one hardware compression scheme, modems equipped with MNP Level 5 compression.

Data compression schemes can typically compress data 50%, or to half size, on average, and thus double the effective speed of any given communications approach (e,g, a 2400 baud modem effectively moves data at 4800 baud.)

Types of data

The type, of information being transferred can greatly affect the success of a data compression scheme. Data compression works by eliminating redundant information. If the data to be compressed has little redundancy, data compression will yield little or no improvement.

Four test files were selected:

These test files were chosen to represent a range of data types with regard to compressibility. The C-source code file is a pure ASCII text file with no imbedded binary information. This type can be compressed a great deal (78% in testing.) In contrast, the DOS executable file is all binary information and much harder to compress (52% in testing.)

If a data file has little or no redundancy, data compression approaches can actually increase the size of the data a small amount. This is because the scheme used to compress the information takes up a small amount of space itself. If the data cannot be compressed, this overhead is added to the file. Almost all data files can be compressed to a smaller size. The most common case in which compression results in a larger file is when data compression is applied to an already compressed file.

Types of software compression

A large number'of data compression schemes have been developed in recent years, using different approaches and with varying results. Different compression techniques work best with different kinds of data.

Previously, there were many stand-alone compression programs implementing different compression techniques. A new class of program followed that combined data compression routines with another popular feature, a librarian, which combines multiple small files into a larger file. The resulting program was call ARC, short for 'archiver.'

A major feature of Arc is it's ability to use many different types of compression, scanning the file beforehand to use the best compression technique for that particular data file. Arc became quite popular, and was ported to many other computer systems, becoming a de facto standard.

After legal battles, PKWare, one of the major PC-compatible ARC software producers, was forced to develop a new compression program, which is called ZIP. In the process, they developed new compression techniques which are much more efficient than ARC. The ZIP format is just now being ported to other platforms.

Types of transfer protocols

A file transfer protocol is a set of rules or methods by which tosystem to another. File transfer transfer a file from one protocols are designed to guarantee that the file arrives correctly, compensating for errors in the transmission and retransmitting pieces as necessary until all data arrives intact.

All protocols tested here use the same basic approach to transmitting a file. Each file is broken into fixed-size pieces,or packets. Each packet is then sent, along with header information and a checksum of the data in the packet, to the destination machine. The destination machine then checks the data it received against the checksum, and requests either the next packet, if all is well, or that the packet be resent, if the packet has been damaged in transit.

The principle difference between the reviewed protocols is packet size, as specified in the following table:

 

Kermit Xmodem Ymodem Zmodem
96 bytes 128 bytes 1024 bytes 8096 bytes

Packet sizes of transfer protocols

(Zmodem packet sizes actually vary as needed, but 8096 bytes is the largest packet size and the size most often used.)

As packet size increases, data transmission efficiency increases, because more time is spent actually transferring data. However, as the number of transmission errors increases, large packets become less efficient, because large amounts of information have to be retransmitted. Thus no one packet size is ideal for all conditions. Zmodem protocol is unique amongst the protocols reviewed here because it varies the size of it's packet continuously as phone line conditions degrade or improve, keeping the packet size optimal for the actual conditions.

Results of data transmission testing

Four separate factors of data transmission were evaluated in these tests. Unfortunately, the possibility of side effects caused by specific combinations required the exhaustive testing off all combinations, which would require 96 separate tests. Fortunately, two cases were able to be eliminated.

First, standard modems are not affected by the type of data being transferred. A 2400-baud modem transfers data at a fixed rate of 240 CPS (characters per second.) Therefore, the 4 types of data did not need to be individually tested. Also, the 3 software compression options (none, ARC and ZIP) did not need to be tested, since over standard modems compression changes only the amount of data transferred, not the rate. Only file transfer protocols affect the rate of transmission, eliminating almost half the tests.

A second elimination was the Kermit protocol. Kermit was eliminated after it was clearly much slower than Xmodem in sample tests.

All four factors affect the data transmission performance over MNP Level 5 data-compressing modems, requiring exhaustive tests. Exhaustive file transfer timing tests involved sending all four file types thro.ugh each of the three pre-compression options with each of the four file transfer protocols.

For ease of testing, the file type and pre-compression factors were combined. This resulted in12 different output files, as listed in the table below. This table lists the test filename, file size, and percentage of compression for each test file.

File name
File Length
Compression%
Original ARC ZIP
C source CSOURCE.C
208,405
CSOURCE.ARC
67,427
32.3%
CSOURCE.ZIP
39,198
18.8%
WP file DEVPROD.MEM
142,029
DEVPROD.ARC
62,550
44.0%
DEVPROD.ZIP
43,025
30.2%
EXE file TEST.EXE
206,321
TEST.ARC
142,706
69.1%
TEST.ZIP
107,634
52.1%

Transmission test files

The first tests run were to establish the data transfer rate of standard 2400 baud modem connections. Again, this rate is fixed with the theoretical maximum of 240 CPS. The actual transfer rate is a measure of the efficiency of the data transfer protocols.

 

Kermit Xmodem Ymodem Zmodem
148.5 cps 198.9 cps 224.1 cps  

Non-MNP data transfer rates by protocol
(all tests done on a 185K test file)

In all tests, phone line conditions were good and large packet protocols were dramatically more efficient at transferring files. Note the direct correlation' between actual file transfer rate the packet size. (Figure 2)

Transfer rate versus packet size

Since non-MNP transfers occur at fixed rates, to figure the amount of time spent transferring a file with a given protocol, simply divide the file size in bytes by the listed speed for the transfer time in seconds.

MNP modem tests

After Kermit protocol was eliminated, Xmodem, Ymodem and Zmodem file transfers were tested with each of the 12 test files generated by the combination of file type and software compression. Two factors are given for each transfer time: the actual time spent transferring the file, and the data rate of the transfer. Each of the grids that follow give the timings for one protocol.

Xmodem xfer Original ARC ZIP
C source 24:16 min 9:54 min 5:54 min
WP file 17:47 min
133.1 cps
9:47 min
106.5 cps
6:26 min
111.4 cps
HIS transactions 3:17 min
139.2 cps
1:41 min
102.2 cps
1:30 min
99.3 cps
EXE file 26.04 min
131.9 cps
22:03 min
107.8 cps
17:00 min
105.5 cps

Xmodem transfer test results over MNP Lev 5 modems

For comparison, Xmodem protocol over a standard modem maintains a steady 198.9 CPS transfer rate. Testing revealed that for all cases, Xmodem over MNP Lev 5 modems is slower than normal modems. This is discussed in detail later.

Ymodem xfer Original ARC ZIP
C source 11:13 min
309.6 cps
5:38 min
199.4 cps
3:30 min
186.6 cps
WP file 9:31 min
248.7 cps
5:38 min
185.0 cps
3:43 min
192.9 cps
HIS transactions 2:04 min
221.1 cps
1:02 min
166.5 cps
:54 min
165.5 cps
EXE file 13:55 min
247.0 cps
11:27 min
207.7 cps
9:08 min
196.4 cps

Ymodem transfer test results over MNP Lev 5 modems

Ymodem protocol over standard modems maintains a steady 224.I CPS rate. Note that in several cases, MNP lev 5 modems coupled with Ymodem produced a distinct, but small improvment in speed. Note also that, like ARC and ZIP, MNP'S ability to compress data is closely related to the type of data being compressed.

 

Zmodem xfer Original ARC ZIP
C source  
WP file
HIS transactions
EXE file

Zmodem transfer test results over MNP Lev 5 modems

(Zmodem protocol results not yet available. Expect to find much improved data rate due to Zmodem's 8K block size.)

Conclusions

ARC, ZIP and MNP compression schemes accelerate data transmission by lowering the actual amount of data sent. Testing clearly shows that ZIP is by far the most successful compression mechanism across all types of data (Figure 3) and that for all file types, using ZIP compression on data before transmission over standard modems is the fastest possible transfer method.

Compress success by file type

A review of the actual transfer times shows that ZIP file transfers can be as much as 400% faster than non-compressed transfers, and that ZIP improves on ARC compression by 35-70%.

The ZMODEM file transfer protocol's dynamic packet sizing makes it the most effecient transfer protocol. It is also the only standard protocol available that adjusts to line conditions for maximum throughput, making it the best choice.

MNP compression, as shown here, is about 1/3rd as effective as ZIP for any given file, and in one case actually increased file times.

Given that MNP'S effectiveness is eliminated on pre-compressed information, MNP is inappropriate for the batch-oriented environment of the NOS. Further checks showed, however, that MNP Lev 5 interacts with the transfer protocol as well, drastically affecting performance.

MNP transfer performance versus packet size

From performance indicators and actually watching the modem sending and receiving data, it became clear that the MNP protocol has problems interacting with packet-oriented protocols, and this interference seriously inhibits the potential performance of this form of compression.

Because MNP modems attempt to compress data as it comes in, and actual transfer rates vary, all MNP modems intemally buffer information. In order to get the best compression, the MNP modem, will pause a moment after receiving data to collect more data in its buffer. The more data it can buffer, the more effective its compression.

However, with a packet-oriented protocol, this added delay cripples MNP performance. Packet protocols rely on a sequence of:

send - acknowledge - send - acknowledge - ...

With the MNP protocol, an extra delay is added before each and every transmission. This delay slows transmission enormously, to the point of Xmodem protocol where the delays outweigh the speed increase of 30-50% compression.

Since the extra delays come with each new packet, MNP protocol gets more and more effective as packet sizes increase and fewer packets are actually sent. Testing the full spectrum from Kermit to Zmodem, it showed that MNP protocol could provide high transfer rates for non-compressed data.

Kermit Xmodem Ymodem Zmodem Echo'd stream Non-echo stream
148.5 cps 143.1 cps 309.6 cps   443.4 cps 544.1 cps

MNP Lev 5 data transfer rates by protocol
(all transfers using 200K C source file)

Two extra tests were then run with no packets. Currently, no standard protocol is available that sends binary data in a single packet. An ASCII file was 'typed' from one machine to another in its place. With remote-echo of the stream turned off, MNP Lev 5 turned in a transfer rate of 544.1 CPS, the equivalent of a 56% compression rate.

The relationship of MNP Lev 5 transfer rates to packet size is shown here. Please note that the packet size of 'stream' protocol is given as 17,000 and 18,000 respectively. Since 'stream' protocol does no block checking, these should actually be the size of the test file, 200,000. They were truncated to fit on a normal graph.

MNP throughput versus packet size

Note that the 'stream' protocol does no error checking. In the case of an MNP connection, however, no error checking protocol is needed. The MNP protocol is layered protocol, where Level 5 offers compression. Lower layers 1-4 offer error correction at the byte level. MNP modems resend each byte that is destroyed due to poor line conditions, giving the appearance of having a perfect connection at all times, and obviating the need for higher-level error correcting protocols.

The extremely good 'stream' performance of MNP protocol makes it ideal for online, interactive systems. Testing suggests that a new file transfer protocol, designed to take advantage of the MNP environment, could extract signifigantly better performance than achieved with existing standard file transfer protocols.

All MNP throughput tests were done with the CSOURCE.C file, which can be compressed to a high degree. Tests of MNP 'stream' throughput with HIS transaction, Word Perfect and DOS EXE files was not possible because a true 'stream' protocol does not yet exist. The 'typed' test approach simulated a user typing at maximum speed. The other test files contain binary information which would have been interpreted as user commands, and they could not be tested with this approach.

Considerations

The ZIP protocol is currently not publicly available for Unix systems. However, a group exists for the purpose of porting ZIP to minicomputers and mainframes. The beta release of Unix ZIP is currently available for testing.

Existing HIS transactions were highly compressed in testing. This implies that there was a high level of redundancy in the HIS data format. Logically reducing data fields to the absolute minimums required to store the data (for example, storing a yes/no field in a single bit) increases the level of entropy and makes a file harder to compress. This suggests that ZIP and ARC may have applications for measuring the effeciency of our transaction file format.

The Incom.modems used in these test are only one of many MNP Lev 5 compatible modems sold. It is conceivable that the MNP implementation they use is wrong, and that the delays it introduces are not part of the actual MNP Level 5 specification. No other MNP modems were available at the time of testing.

Test conditions

The following conditions applied to these tests:

TEST.EXE a version of the HIS
DEVPROD.MEM The SES appendix, in original memo form, with included graphics
HISTRANS a collection (-12) of HIS transaction files from the HIS-w-communications prototype
CSOURCE. C A concatenated file comprised of three large Vitamin-C source code files.

All transfers were initiated by hand, adding overhead which is more signifigant on the smaller files. This is consistent with actual MOS conditions, where time will be spent dialing the phone and initiating the connection.