By Pernille Røge, Associate Professor – Department of History and Paula Orozco-Espinel, PhD Candidate – Department of History
While historical research once required an almost craft-like approach to primary sources, the digital turn has profoundly revolutionized archival work. It began, perhaps, with the handheld digital camera. Suddenly, primary research no longer required months of archival visits and a painstakingly slow process of sifting through boxes and boxes of archival material. Instead, it became possible to spend only a few days in an archive, order boxes, and quickly take thousands of pictures of archival records and study them later in the comfort of home. Then came digitization, online access, keyword searches, and Google translations. However, these digital tools have their limitations, particularly for historians focused on the Middle Ages and the Early Modern Period. Most archival records from that time are handwritten manuscripts and correspondences, penned in various scripts that are no longer taught in schools. Paleography, therefore, has always been an essential skill for many historians, and early scholars often find themselves utterly lost in transcription. Well, no longer.
Nowadays, the online AI-powered text recognition platform Transkribus can facilitate the reading and transcribing of historical documents—including those thousands of pictures of archival sources taken with a handheld digital camera. Thus, this platform opens research avenues previously unavailable to researchers without training in paleography and facilitates the work of experts.
Getting started with Transkribus is straightforward. You can add as many documents as you want into the web platform’s desk. Transkribus allows you to create “collections,” the equivalent of a folder on your computer, to organize your documents. Unlike a computer folder, however, in Transkribus documents can be linked to multiple collections, facilitating accessing them without having to create multiple copies. Transkribus accepts various formats, including JPGs and PDFs, meaning that it is possible to work with documents that contain multiple pages.
Once you have uploaded your documents, it is time for the handwritten text recognition (HTR) part. Transkribus offers different language models for various languages and time periods you can use to transcribe text automatically. In our experience, it is clear that, at the moment, available models work better with 18th-century French and English texts than with 15th-century Spanish or 18th-century Danish texts—simply because language models corresponding to those languages and periods have been trained more extensively.
If available language models do not suit your needs, Transkribus allows you to train your own model. This requires more effort but can quickly pay off when working with big sets of specialized text. To train your own model, you need to first manually check some transcriptions and mark revised document pages as “Ground Truth,” meaning you deem them correct. Human errors are still possible, but scholarly expertise can significantly help train a model satisfactorily.
Once you have selected a language model or trained your own, the Transkribus document editor allows for intuitive editing by synchronizing image and text. On the left side of your screen, you can see the document page that is being transcribed and the automatic transcription shows up on the right side. Thus, you can easily correct transcription mistakes.
In Dr. Røge’s experience, mistakes are most common when transcribing proper names and abbreviations. Yet, overall transcriptions are generally accurate enough that even though they require editing, Transkribus is still a huge time saver—especially if you are only starting to learn to read the source’s handwriting. The Transkribus document editor also allows you to assign tags to specific words or sentences, which is useful for building databases and doing some kinds of text analysis.
Transkribus’ capacity to save researchers time is undoubtedly an appealing feature, but the potential of this tool goes beyond efficiency. It can also be a great resource in the training of future historians. Dr. Røge’ recognizes that even though many graduate students are eager to work with early modern sources and may even already know the necessary languages, they are often unfamiliar with the handwriting. With the help of Transkribus, this barrier is overcome much more easily because the AI tool suggests what the proper word might be. Similarly, undergraduate students interested in gaining first-hand experience with early modern primary sources but lacking the ability to read them can use Transkribus to decipher anything from handwritten accounting books to personal correspondence or official documents.
Transkribus was developed during two EU-funded research projects, and since July 2019, it has been maintained and further developed by READ-COOP SCE, a purpose-driven cooperative. READ-COOP SCE describes its mission as to “provide a comprehensive range of tools and services that empower researchers, institutions, and individuals to collaboratively discover and explore the rich tapestry of history.” Our own experience taps into Transkribus’ potential for enhancing historians’ work, and we look forward to learning more about how other colleagues take advantage of this continuously improving AI tool.