Digitizing Doctors

Medical directories are rich but understudied sources of historical data. They are predominately lists of physicians and a variety of information about them. You can look up a doctor easily, but it requires a lot of counting and comparing to understand the medical profession as a group. This sort of counting and comparing are exactly what computers are good at, but because these sources are printed, computers can’t use the directories’ data. So digitizing these directories in a format that computers can understand will open a multitude of “big data” possibilities to study the history of the medical profession.

To solve this problem, I have explored the digitization of American Medical Directories (AMDs) and prototyped a process to turn them into a database format. AMDs are particularly useful because they claim to be a complete listing of U.S. doctors (and, in some years, of Canadian doctors as well). The first AMD was published in 1908 and subsequent editions followed every few years. The information in them changed regularly, but they often include where physicians lived and had offices, their licensing and education information, whether they had any specialty and what it was, and sometimes even their office hours.

I chose the 1918 AMD to experiment with digitizing because it was the first edition I found contained information about specialization, which emerged in the early twentieth century in the United States.

To extract digital information from the print AMD, I worked with librarians at Rice University to get high quality scans of the 1918 directory and perform OCR (optical character recognition) on them. I then wrote two programs to parse the data into individual physicians and break their information into fields. You can read more about that process in an article I published in Medical History.

The title page of the 1918 AMD.
An example page from the 1918 AMD showing the beginning of the list of Illinois physicians.

I have explored the data I extracted from the 1918 AMD and plan to digitize more medical directories in the future. So far, I have analyzed the geographical distribution of specialists as well as physicians listing hours in the directory (a totally new kind of research!). You can see a preview of that investigation in the poster presentation to the right.

In the future, I hope to include students in this research. Students can be responsible for using the tools I developed to digitize and examine new directories. They could then incorporate their results in a research paper and perhaps even a publication.

Stay tuned for future findings!

Poster presentation with initial results from digitized 1918 AMD.