Page 1
 
Page 2
 
 
 
 
 
 
 
 
 
 
 
 

 

The making of Christ On Disk

by Denver Reynolds

 

Intoduction

This is the story behind the Christ on Disk project. The more technical details are omitted so that it is easier to follow for those without a programming background.

The initial idea

During self-study of the programming language C++ I wanted to exercise what I was learning and put it to good use. I had the idea of creating a Bible viewing application that was small enough to fit onto a floppy disk.

This was a good idea because...

  • it is a noble project and serves a useful purpose other than just typing in a particular syntax.
  • it would give me good experience in using C++ ·
  • making a windows application was something I'd always wanted to do.

This was not such a good idea for a C++ novice since I would... ·

  • need to develop an event driven application with a graphical user interface (GUI).
  • be able to find much less difficult starter projects to try!

Preparation and outline

I first searched for similar applications. The Bible viewing applications I found were all about 5Mb. None of them could fit onto and run from a single floppy disk.

[Much later on into the project I did eventually find one Bible viewing application that could fit on a 1.44Mb floppy disk. It is a commercial product. The company, Eurofield Information Solutions, "is an Australian Technology Company that specialises in ELECTRONIC Publishing and Distribution Software". They use protected compression techniques.]

I downloaded a copy of the Bible text and unzipped it. Hmm, it was over 4.5Mb in size!

Each character of the Bible text occupies one space on the computer disk, that is one byte. The Bible text is made up of over 4,500,000 characters. So this takes up over 4,500,000 spaces, or 4.5 million bytes. However, a floppy disk only has about 1,700,000 bytes of space at its very maximum when it has been specially formatted. Can you see a problem? That's right, the Bible text as it is will not fit onto a floppy disk, even if you ask it to nicely. Apart from the Bible text I would have to fit the viewing application onto the floppy disk as well!

A short session with compression

It was quite clear that I would have to find a way of getting more than one character into one byte of space. In order to meet my goal I would need to use compression. Logically, useful compression involves making something smaller with the ability to reverse the process (decompression) to get back to the original form.

Basic building

Whilst I pondered on what compression to use I was also working on other obvious tasks. I designed the GUI and interactions between the objects. Using ed (the Unix line editor) commands in vi (a Unix text editor) under the Linux operating system I formatted the Bible text into a useable form for inclusion into the C++ program.

Some of the tasks during the project needed special programs to be written in FORTRAN or C++ to complete them. For example, some involved data conversion; others involved counting occurrences of certain characters.

After developing and testing the different functions independently the project was assembled into a working prototype. Having completed the basic Bible viewing application, without compression I checked the program size. It was 6.3Mb!

A long session with compression

Initial compression methods

Being an engineer I opted for a practical compression solution. I downloaded and tried using different executable compressors on the program. This effort reduced the program to approximately 2Mb. It was still too large to fit onto a floppy and it became evident that the Bible text itself would need to be compressed prior to inclusion into the program.

I then searched the Net for program source code I could use with my application to do compression, ie. a compression library. The free to use compression library I found wasn't compatible with the integrated development environment (IDE) that I used to build the application. So I read about compression techniques and eventually ended up devising my own compression techniques that made use of specifics of this case.

I identified 63 unique characters in the Bible text. This meant I could represent each character by ¾ of a byte. However, the maximum size reduction I could achieve using this was 25%. This would not be sufficient.

Pattern replacement

Then I had an idea. I began working on pattern replacement. Prior to including the Bible text into the program I replaced common text patterns with a character that did not appear in the Bible text. Then, when the program displays the text the process is reversed. For example if "_the_" is replaced by "@" 4 less characters are used. ( "_" underscore represents a space character). For the entire Bible there are 62,058 occurences of this pattern, so replacement would save 248,232 bytes! When the program encounters a "@" it would display "_the_" in its place automatically, and the reader would see the original text.

If I had used a standard form of compression on the Bible text I would then find using an executable compressor on the program would not be as efficient. It is like using two different metal crushers on a car. After the first crushing the second crusher will not crush the car much more. However the beauty of the pattern replacement compression technique is that it removed matter instead of shrinking it. Imagine instead of using the first crusher you replaced car parts with toy car parts, and so end up putting a toy car into the final crusher. Not only does the toy car take up less space (as with any compression) but there is also less substance going into the crusher. It is a subtle but significant difference.

This helped to reduce the size of the Bible text going into the program by about 1Mb. Though in terms of our analogy, this was still a pretty big toy car, and needed further reduction.

Block encoding

I devised a compression method, called block encoding, that would be able to fit the 12 most common characters into half of one byte, ( i.e. a nibble). Other characters would take up the same amount of space as previously. There was also allowance for thirteen pattern matches. Using this method the Bible verses were converted into a stream of four bit blocks (nibbles). One block is used for a common character and two blocks for others. The blocks are then paired together to form a byte stream. This is the encoded form of the Bible verse.

A unique block sequence is required to signal the end of the verse since the normally used null character value, zero, would be ambiguous with a zero appearing in the byte stream values.

Pulling together

I then did a more extensive investigation into pattern replacement to discover the optimum patterns for replacement. This involved looking at pattern interference. As a simple example of pattern interference: pattern replacement of "_the" would affect a replacement for "_then" because "_the" is contained in "_then". Pattern interference can significantly influence pattern replacement choices.

The bytes saved from each replacement pattern were calculated. This was done taking into account the importance common characters had due to the block encoding. The patterns with the most savings were selected for replacement.

The pattern replacement, block encoding, and executable compression combined together meant the program was able to fit onto a floppy disk. Christ On Disk was now true to its name. It fitted onto and ran from a single floppy disk!

The project was a success!

 

Credits

God - for the Bible, without which this project would be a non-starter!

Spencer in the U.S. - for purchasing the executable compressor licence on my behalf.

Petite - the executable compressor I used in conjunction with my custom compression techniques.

Bloodshed DevC++ - the integrated development environment I used to build the application.

wxWindows - the C++ cross-platform framework that allowed for rapid application development.

You - for taking time to check out Christ On Disk.

 

 

 

 

Top

c o p y r i g h t  2001-2002 D.Reynolds