Monday, October 7, 2013

“Lost Children” Texts: Returning to the Archive in the “Google Books Era”

[Cross-posted from my personal blog.]

I've been reading Matthew L. Jockers's recent book, Macroanalysis: Digital Methods and Literary History (Urbana, IL: U of Illinois P, 2013), and I find much of it compelling. One thing that Jockers mentioned--and my reaction to it--has been on my mind for the last few days: the massive availability of texts in our own digital culture. But I want to put pressure on this notion, especially in relation to medieval texts.

My thinking about this issue began to percolate when I read this:
In this Google Books era, we can take for granted that some digital version of the text we need will be available somewhere online, but we have not yet fully articulated or explored the ways in which these massive corpora offer new avenues for research and new ways of thinking about our literary subject. (17)
Jockers' point is, of course, generally taken to be true, and the massive digitization of texts has certainly changed research across the humanities (he addresses copyright issues in his final chapter, "Orphans"). There is no denying the major benefits to scholarship that have come from initiatives such as Google Books, HathiTrust, and Project Gutenberg.

My knee-jerk reaction to Jockers' observation, however, was this comment:*
Though what about pre-modern, unedited texts? Jockers points to his own idealism & problems such as copyright, but there are also other problems of access: e.g., some of these texts have never (or almost never, or not properly) freed from the archive.
Of course, medievalists benefit greatly from free, open access to many texts, particularly those printed before and even into the early twentieth century. In the USA, copyrights up to 1923 have now generally expired--through there are exceptions, such as when estates continue to hold rights--putting all works printed before then in the public domain. It's worth noting that many resources and editions of medieval texts still consulted (such as those published by the Early English Text Society) were printed in the late nineteenth and early twentieth centuries. There's no doubt that digitization of these texts are game-changers.

Yet, if Jockers is concerned in his final chapter about copyright and "orphan" texts--instances in which ownership and copyright are unclear--I'm concerned about what we might think of as "lost children" texts (though the two types aren't mutually exclusive) that have yet to be brought out of the archive and made accessible to scholars.

Many examples come to mind. One instance that I've encountered and worked on is a Hiberno-Latin commentary on Colossians that survives in St. Gall, Stiftsbibliothek, Cod. Sang. 1395, which has been mentioned in various scholarship but remained unedited and, subsequently, largely unexamined (see now my transcription and analysis in Sacris Erudiri 2012). This text is just one representative of many Hiberno-Latin texts available only by directly examining the manuscripts; in fact, this issue is the reason for the initiative behind the Corpus Christianorum Scriptores Celtigenae series (published by Brepols). Other examples abound in J.-P. Migne's monumental nineteenth-century Patrologia Latina (PL), some volumes of which are digitally available, while the whole text corpus and searchable database remains locked behind a hefty commercial paywall maintained by Chadwyck-Healey, despite the texts being theoretically available to the public domain. Again, the Corpus Christianorum is relevant here, since the Series Latina seeks to supplant Migne with a new line of critical editions. Like the PL database, however, the Brepols counterpart--the Corpus Christianorum Library of Latin Texts (CCLT)--remains available only to users (or institutions) paying the steep subscription price.

So how do we confront the problem of "lost children" texts if we want to pursue macroanalysis?

Presumably, we need to turn back to the archive, to construct corpora or add unprinted materials to our already existing corpora even before we can begin our macroanalysis. (On a small scale, this has been my intent with my project on "Studying Judith in Anglo-Saxon England".) In both his first and final chapters, Jockers quotes an article by Rosanne Potter from 1988: "Until everything has been encoded... the everyday critic will probably not consider computer treatment of texts" (qtd. 4 and 127).** Jockers follows this up, "It is a great shame that today, twenty-four years later, everything has been digitized, and still the everyday critic cannot consider computer treatments of texts" because of copyright (175). He does offer a caveat in a footnote, remarking, "No, not everything, but compared to 1988, yes, everything imaginable." In our age of big data, it's easy to meet this assumption and accept it; but compared to 1988, much from the medieval period remains in the archive, not even transcribed, edited, or printed, let alone digitized. While some of us are working on computer treatment of texts, there is still much to do in the archives before some of it can be done.

* None of my discussion is meant to denigrate Jockers's work: he makes an excellent case for literary "macroanalysis" and offers a series of superb analyses throughout his book. Furthermore, related to my concerns tangentially, his comments about copyright in the final chapter are salient. The fact that he does not address the same issues that I raise here is not to be seen as a mark against his work, but perhaps indicative of one difference that exists between digital humanities work on 19th-century texts (Jockers's own field) and medieval texts (my own field).

** See Rosanne Potter, "Literary Criticism and Literary Computing: The Difficulties of a Synthesis," Computers in the Humanities 22 (1988), 91-7.

No comments: