Welcome to Project Gutenberg at Eccesignum.org

This project is new and undergoing constant, massive alteration. The list of currently available texts, with links, is to the right. Also linked are all of the notes and resources which went into the creation of this project.

An Update, April 6, 2003

I see that the world at large has taken interest in my little project. Welcome!

My time, lately, has not been my own, therefore I have accomplished little in the past two months. On reviewing the specification for XML Schema as proposed by the W3C, I feel that [1] They have made it more complicated than it needs to be, and [2] a DTD will be sufficient for now. Of course, it will be a while before I get around to writing a DTD.

If you have any questions or comments about the work I have done thus far, please! feel free to contact me at john@eccesignum.org. Also, if you are a volunteer for Project Gutenberg, drop me a line. The Eldred case has cast a shadow over all things related to the public domain, and I would like to hear others' views on the subject.

Project Gutenberg Research

I have taken it upon myself, in me free time, to do some volunteer work for Project Gutenberg. The following document is a constantly evolving abstract of the thoughts and processes that are going into the conversion of some of the Project Gutenberg e-texts into XML format.

In looking at the current DTDs and examples for XML markup created by the HTML Writer's Guild I have found that, while quite useful and appropriate for the individual document formats, the breaking up of the DTD into so many pieces causes difficulty in documents which are comprised of more than one format; e.g. books which contain poetry.

Given the current state of technology, both in XML and information architecture, I propose switching from using multiple Document Type Definitions (DTDs) to a single XML Schema which will take into account all of the permutations possible among the different documents in the Project Gutenberg respository. This will allow for much easier building and manipulating of XML versions of the PG documents.

Alternately, three similar Schemas, "book", "play", and "poem", could be created. These would essentially be identical structures with different tag names.

The first difficulty is determining the appropriate level of granularity at which to dissect the documents. Individual lines are essential in poetry, but not in plays or books. A paragraph has meaning in a book, but not a poem. A speech, which could easily be a paragraph in a book, needs a speaker identifier in a play.

From a purely pragmatic point of view, a generic XML document which takes into account all of the possible text structures -- paragraphs, lines, lists -- would be ideal. The point of using a unique PG Schema, however, and indeed the point of marking up the structure in XML, is to have the document readable by both users and machines. Thus, the need for, say, specific tags for both book paragraphs and play speeches. Structurally they are identical, but contextually they demand unique identifiers.

The temptation is to generalize the structure of the XML to the point that a single tag is used for a paragraph in a book, a verse in a poem, and a speech in a play. While this makes sense from a strictly structural point of view, it causes the XML to be less useful to a person reading the raw code. Compare the following speeches from Tartuffe:

<speech>
<speaker>DORINE</speaker>
<line>Besides, 'tis downright scandalous to see</line>
<line>This unknown upstart master of the house--</line>
<line>This vagabond, who hadn't, when he came,</line>
<line>Shoes to his feet, or clothing worth six farthings,</line>
<line>And who so far forgets his place, as now</line>
<line>To censure everything, and rule the roost!</line>
</speech>

<para>
<title>DORINE</title>
<span>Besides, 'tis downright scandalous to see</span>
<span>This unknown upstart master of the house--</span>
<span>This vagabond, who hadn't, when he came,</span>
<span>Shoes to his feet, or clothing worth six farthings,</span>
<span>And who so far forgets his place, as now</span>
<span>To censure everything, and rule the roost!</span>
</para>

While structurally identical, the tag names in the first speech have more meaning with reference to their content.