The Limits of Digitization

In the last three decades, digital sources have exploded online. Trendsetting projects from the 1990s, such as the Valley of the Shadow and the Internet Medieval Sourcebook, have been joined my an ever vaster set of sites also aiming to make historical sources available online. Universities have increasingly made their students’ theses available online, while Google Books and HathiTrust have made older books no longer under copyright widely available. Photogrammar has made historic photos of the United States more easily discoverable, and the Prelinger Archives have done the same for film and video recordings. Together sites like these have made historical discovery and research much easier for academics as well as lay people.

However, these digital archives do not and cannot perfectly recreate the sources they have made available. For instance, choices in how to digitize and frame photos impacts the user’s experience of them.[1]

Even when we only consider textual sources that should be easily represented on a two-dimensional screen choices about how and what is digitized affects users’ experience of the materials and their ability to use them effectively. When texts are represented as pictures, the lighting at the time of photography can alter what can be read. Unlike a physical text, where moving the light source or even just your head as you look at it can reveal more, the information in the photo is fixed. Similarly, which copy is photographed can have an outsized impact on how the source is interpreted by researchers. A “clean” copy without marginalia gives a sense of the text as fixed in time, while marginalia can illustrate point of contention with the text or give the researcher a sense of how people used the text. If a library has both an annotated and a clean copy of a text, picking which one to digitize greatly affects the meaning.

In my own historical research, these issues have had an impact. While I am grateful for the availability of digital sources, sometimes their quality makes them nearly impossible to use. Here in one example, I was unable to read crucial parts of the digitized text because the photograph was washed out. Additionally, there are what appear to be artifacts of image compression making reading this digitized text even more difficult.

Washed out digitized source

In a different circumstance, I was lucky to see a physical copy of a book that had marginalia related to my research, while the most available online source had none. Here you can see the “clean” copy of William Douglass’s A summary, historical and political, of the first planting, progressive improvements, and present state of the British settlements in North America… (1749) from HathiTrust. In comparison, see my photograph of the same book (though a slightly different printing) held by the Weiner “Spirit of America” collection at Florida Atlantic University. The Weiner copy included the annotation “Georgia too near the Spaniards at St. Augustine – negroes wd. be dangerous,” which happened to be the topic of my research.

“Clean” HathiTrust digitized text without annotation
“Spirit of America” collection text with annotation

Ultimately, digital sources are wonderful starting points, but we can’t assume them to be the final or most important version of what they represent. Originals must be preserved, and historians need access to multiple copies and editions to be successful in their research.

[1] Paul Conway, “Building Meaning in Digitized Photographs,” JDHCS 1, no. 1 (2009): 18.