ePlaice / For the Best Software on the Net

Mainly Free and Open Source Software

eBook Navigation

Download eBooks |

Valid XHTML 1.1

Latest news

11 Oct 2006: A small test page has been installed to allow a PDF ebook to be opened and saved if the user so wishes. This needs further work.

01 Oct 2006: There are over 19,000 free ebooks available for download on the Project Gutenberg website

Links:

e-Books

Project Gutenberg

For me reading a good book is on a par to listening to good music and easily beats most other mediums including TV, Videos and DVDs. Hopefully, the computer and internet revolution can add a little to the reading experience.

Two areas that the PC and internet should be able to improve are searching for books and even locating passages. In addition, this revolution should make it easier for people to obtain books (in digital format) and better the facilities found in a conventional library. So let's have a quick look to see whether we are doomed to a future of 'Neighbours' and 'Big Brother' or whether e-Books have anything to offer?

Why e-Books?

Of course most people would really like to read a 'real' book, instead of one sitting on a computer screen, and I cannot disagree with that philosophy. However, there are certain advantages of e-books, when it comes to looking for old classic titles that may not be stocked in your library, or are out of print. However, e-books really come into their own when you are researching, or looking for that phrase or quote. In a day and age when there is so much trivia in all the media including TV, music, art and cinema, it is refreshing to re-discover some of the classic works of all time and see that the settings may have changed, but the basic plots are as relevant today as ever.
A further reason for e-Books is the usefulness of audio books, for when you are on the move or when you just prefer having a book read to you. Unfortunately, the current state of machine reading (despite all the earlier hype) is very poor in my opinion, so this is more a future possibility. For now, I prefer to use a CD read by a human.

Technical Developments

The new Sony Reader is set to alter the book medium in the not too distant future, although the price, use of proprietary formats and the built in protection may deter quite a few users. Nevertheless, the use of e-ink which avoids the eye strain caused by conventional backlit screens, represents quite a fundamental advance over the current readers available.

Available Formats

Ascii Text
The standard format is plain ascii text, which is fine for transportability between computer systems and general use. For example Project Gutenberg list their reasons for using ascii text because it "can be read, written, copied and printed by just about every simple text editor on every computer in the world". This is true because as they say different formats have been and gone over the past 30 years and they are interested in preserving texts in electronic format for centuries.
However, there is a price for this approach in that it is not at all friendly for the poor reader. For example images cannot be viewed, there is no means to highlight (other than block) using bold or italics, different fonts are not supported and the document cannot be indexed or bookmarked.

HTML (Hyper Text Markup Language)
This ovecomes most of the problems associated with plain vanilla ascii text and is a reasonable short term solution. However, it is a bit messy where you have to have many files and image files hanging around. So the next step was to use Compiled HTML. This is great for help manuals and web pages where many links are required but in my opinion not so good for e-books. The main reason is that HTML describes presentation as opposed to content and therefore it is an unlikely long term method for holding the contents of e-books.

XML (eXtensible Markup Language)
This would seem a better long term bet for most future developments and already there is XHTML which is an extension to HTML for the web. I have found that XML is also an excellent application for holding the contents of e-books. Once a book has been converted to XML then it is possible to create e-books in all sorts of different flavours that provides good future proofing. All that is needed is another convertor. At the moment there are convertors that will take an XML document and produce a very acceptable PDF or Microsoft Word version. See the download section for examples. XML provides powerful methods for holding page based documents, documents with outlines (e.g. the Contents), various heading styles, tables and images, as well as providing important text descriptors such as bold and italic.

Adobe PDF Format
Yes, this is proprietary but as detailed above it is quite feasible to create PDF files using an intermediate XML language. The process starts with a text file, then an XML file is produced and finally a PDF file can be produced, which is far superior to the PDF documents which are produced by using a printer report type solution. For example, you can obtain a proper outline (bookmark in Adobe terminology), and also your own formatting. For books PDF is quite a good format, since it allows bookmarks, has excellent search facilities and is extensible by means of plug-ins. For example there is a plug-in that will read the book; but as stated earlier; speech technology still has a very, very long way to go before it becomes an accepted part of the desktop experience. I have tried this and become a bit exasperated when it reads out the headings and footers of each page in a strange US twanged voice. My bet is that XML will still be around long after PDF has disappeared; but for e-books it is probably the best around today.

Sources for e-Books

The best source on the net is at Project Gutenberg. Here you will find many thousand e-book texts free and available to download. Incidentally, there is also an Australian Project Gutenberg which doesn't have the same number of texts but nevertheless has an interesting selection. Of course, if you are looking for the latest Dean Koontz thriller, you will not find it here, as there are copyright laws prohibiting copying of the later works. However, it seems that if the author has been dead for at least 50 years then the works are in the public domain. There are variations on this law, which varies from author to author and country to country. There is a good website at ??? that details the copyright laws. One of the problems I find is that an ascii text e-book is not very easy to read, so that is what prompted me to look at the alternatives listed here. Google and Microsoft have both decided that they want to get in on the act of e-books by making a catalogue of books searchable. Whilst this may be useful for research they do not seem to be too keen to let you download a whole text to your PC. Instead they would far rather you bought a new book from one of their affiliates so they can obtain a commission. My intention is to stick with Gutenberg, although it does seem restrictive that you cannot say, download a modern e-book free of charge (for limited usage period) in a similar way to how a library operates. Already some authors are charging by the chapter and in the process eliminating the middleman. This must be the way forward!

Interesting Website

I have become intrigued by World of Gnod website. Enter any of the authors that you have read and it will come up with an uncanny list of authors that you may or may not, have also read. It uses artificial intelligence and matching searches to come up with lists of authors that fit readers tastes. Some of the matches it comes up with would not normally be predicted - so if you enter Paul Theroux it comes up with close associations to Guy de Maupassant and Martin Cruz Smith all of whom I enjoy reading. It may also be a good way of expanding your list of authors. Try it for yourself - you can also do similar searches on music. It certainly beats Amazons predictions of what books would suit on your next visit!!

Text to XML Editing Tips

The first tip and last tip is get a good editor such as PSPad (see the Utilities section). Using the editor, the following steps are required :- I prefer to edit on a copy, sorting out a Chapter or Section at a time

  • Create the headers and footers to appear on each page
  • Create the Outline/Bookmarks for the e-book

Then for each Chapter or Section use the repetitive editor and search / replace functions to :-

  • Get rid of all false eols, so that only eol and line feed appear at the end of a paragraph. One method is to use PSPad to replace all empty paragraph (i.e. replace "^$" with "££". Then use the Join line facility in PSPad to join all the lines into one line. The last step is to use PSPad to replace the "££" with a newline character. There must be an easier way!
  • Remove any blank lines
  • At start of each line use PSPad to insert XML code for start of paragraph
  • At end of each line use PSPad to insert XML code end of paragraph
  • Use h1 for headings, newpage and outline
  • Tidy up any formatting such as bold, italic, text alignment
  • Insert any images using the XML img statement
  • Finally run the EasyReports convertor at Jans Freeware to provide PDF format. Note that by using XML with outline commands in the chapter headers you will automatically get a chapter outline in the PDF file, just like the example books in the download section.

DOC or RTF to XML

If you have an ebook in .doc (Word format) or .rtf (Rich Text Format) then it is a fairly straightforward matter to read this in OpenOffice Writer and then export it as XHTML. Once in XHTML it is but a few steps to replace the line headers with the XML variations and then to use EasyReports from JanSoft to create a PDF book.