Kathryn Tomasek and Encoding Digital Humanities

Kathryn Tomasek has been exploring the use of digital tools to enhance student learning since 1992. She began using XML compatible with the Text Encoding Initiative guidelines in assignments requiring the transcription and markup of primary sources in 2004. As part of the Wheaton College Digital History Project, students in her courses do original research with documents from the founding period of the college. Tomasek’s research project, Encoding Financial Records, received a Start-Up Grant from the Office of Digital Humanities at the National Endowment for the Humanities in 2011. In this episode, we spoke about TEI, teaching, and complexities linked to Digital Humanities.


Wheaton College, Kathryn Tomasek, Text Encoding Initiative (TEI), Alliance for Digital Humanities

The Conversation

Tomasek: I’m Kathryn Tomasek, I’m a professor of history at Wheaton College.

Chambliss: Well, I want to thank you for joining me. The first question I always like to ask people is: what is Digital Humanities for you?

Tomasek: Okay, this is where I say defining Digital Humanities is really hard and people have spent a lot of time and ink working on that. So, I really like to defer to other people who’ve done this stuff and I really like Kathleen Fitzpatrick’s definition that incorporates both using digital tools to examine traditional humanities materials and questions, and then asking traditional humanities questions about the things that make up our digital culture.

Chambliss: Okay. I’ve, spoken with her and she said exactly that [and that it] also is digging into the fact that the Digital Humanities community is a somewhat small.

Tomasek: That’s very much the case. But there are plenty of people who would be willing to talk to you for a full hour about their definition. I just happen to think that it’s really cool to be a historian now because there are these neat tools that can help us ask new things about old stuff.

Chambliss: Right. And I think one of the reasons that I wanna talk to you is because of your really groundbreaking work. Like you were an early participant [in] TEI and TEI initiatives. And for a lot of people, that terminology isn’t gonna mean a lot, but for a small segment of people, it’s going to mean a great deal. Could you define what TEI is for people?

Tomasek: Sure, absolutely. I have to say, I appreciate your saying I was an early participant, but it’s been around for 30 years and I’ve only been a historian for 30 years, [so] I’m gonna contradict you just a tiny bit. In fact, the 30th anniversary of the Text Encoding Initiative, that’s what TEI stands for, was marked with the TEI receiving a big prize from the Alliance of Digital Humanities Organizations in 2017. It’s the Antonio Zampolli Prize. Professor Zampoli was a major figure in the development of literary and linguistic computing from the 1960s. He was a really enthusiastic a supporter of the things that came together to make Digital Humanities, the joint international conferences of the—I can’t even remember what this stands for. ALLC. It’s [the Association] for Literary and Linguistic Computing and the Association for Computers in the Humanities. Our national conference started in 1989 and it’s what we now call DH, and professor Zampolli was a prime mover in the TEI. There was an 11-year project and then they established the text encoding initiative consortium. And I sort of happen to be the chair of the board of directors of the consortium right now.

Chambliss: Can you tell us what TEI stands for and what [it does]?

Tomasek: Okay. The main project of the TEI is maintaining this thing called the TEI guidelines for electronic text and coding and interchange. These are guidelines that define and document a markup language for representing structural renditions and conceptual features of texts. The guidelines focus—not exclusively but mostly—on the encoding of documents in the humanities and social sciences and, in particular, on the representation of primary source materials for research and analysis. So that sounds kind of to me like what historians do, right? I should also say that the guidelines are expressed as a modular extensible XML schema. People who don’t know about markup languages may not know what that means, but XML is like what’s behind every Microsoft office database. Extensible markup language. The guidelines have detailed documentation and they’re published under an open source license.

Chambliss: For the layperson, basically what TEI allows you to do is create electronic editions of primary source documents for simplification, right? I like that. That’s a super simple location and I recognize that but someone’s going to have to be like, what does that mean?

Tomasek: But somebody has to simplify it and I’m really bad at that, so thank you.

Chambliss: Right. This is really important because it’s not just simply that TEI allows you to make [an] electronic version. It’s also a stable approach to this. I’ve taught an intro class to Digital Humanities; I’m teaching one right now, a grad course, and I taught undergraduate too. And students always think electronic things are very stable and we have to go through this whole thing where like, no, they’re not stable. Right? It seems stable because, when you look at it, it’s there. But, it might not be there.

Tomasek: And it doesn’t work anymore.

Chambliss: Yeah, exactly.

Tomasek: I have stuff that I can’t get information off of. It’s just there. It’s that nice coaster.

Chambliss: Right? Yeah, it’s an object that I could hurt someone if they break into my house, but I can’t really use it.

Tomasek: Like an old iPhone. It’s a great glass brick.

Chambliss: Right! An electronic form can become obsolete or can become unusable and I suggest people who are listening to this and thinking about this [to] go to this thing called the Wayback Machine and type in the address of any website you use today and look at the earliest version of that website. Recognize, you know, the internet has changed so much. And because it’s changed so much, this idea of some sort of stable form for presenting something on the web is not a small exercise. It’s a really important exercise. And TEI is, I would argue, probably a form at the heart of that.

Tomasek: Yeah. No, absolutely. And I would say when you’re working in the TEI, you’re working in XML, and XML is not something that renders beautifully on the web, right? If I have an XML file and you click on it, what you’ll see is a tree with a lot of angle brackets, so you will see the marked up stuff. One sort of cautionary thing about XML is that it has to be transformed to be pretty. I’m on the web because the web is expressed in HTML, which is called hypertext markup language. But the XML documents are—and this is what you’re getting at—archival forms of the scholarly edition [of], I would say, whatever text or manuscript you’re working with. And the thing about the archival format of them is that the transformation can be changed as the browsers change.

Or as, you know, we moved from HTML and all that kind of stuff. And I think what many of us learned back in the ‘80s [ in regard to programming the web] when we were using word processors that don’t exist anymore. I can’t access my dissertation electronically anymore. I always point people to this great book by Neil Stevenson from the ‘90s, it’s called In the Beginning was the Command Line and it’s about being an author and having written things you access anymore. And it’s just funny because Neil Stevenson wrote it, but one of the great things about the archival XML in the TEI guidelines is that, if you’ve got the archival edition, then anybody can come along at any point and make it work in HTML 37, you know, when we get to that version.

Chambliss: Right. And the TEI initiative is the governing body at some level to these formats [and] forms.

Tomasek: Right, exactly. You need to think about how the web works. There’s the W3C, the Worldwide Web Consortium—they make standards and the standards that are agreed [on] by the WC3 are why the internet works. So, I just need to say one thing, that the guidelines are not explicitly a standard in that kind of way, but they are widely accepted internationally as defining best practices for markup. This represents texts and manuscripts and vertical scholarly editions and keeps them in an archival format so they can be accessed in the future. That’s a future-proofing thing that we don’t talk about enough.

Chambliss: Right. This is one of the things that’s really important, another part of the bigger debate about the humanities. It is humanities, but it is not. One of the things about the TEI initiative and other initiatives too [is] they require a bunch of people and many of these are institutions across borders. We need to agree on this, otherwise this won’t work, right? So, TEI is, I think, one of the prime examples of a big academic agreement maintained by people across the world. It is quite an international group.

Tomasek: It is, it is.

Chambliss: And part of that cooperation [is] it just was future-proofing, right? Like, we want this moment to maintain itself in the future. So, on some level, [it’s] one of those invisible cooperative agreements that’s really integral to academic digital projects. But you wouldn’t know unless you really were paying close attention. I’m willing to bet that almost no student that walks into an undergraduate classroom [knows] TEI. If I say the word “TEI,” they will think I misspoke. They will think I was trying to say TI. We would need to have a whole conversation about [how] I’m not talking about the rapper, [I’m] talking about the digital form and that would be a great teaching moment, but indicative of the fact that it’s one of these hidden infrastructures of the digital landscape.

Tomasek: Yeah, it’s funny, we’re a small part of the small community of the Digital Humanities and so we might seem a little bit obscure. I was at a conference once and presenting, and the chair of the sessions said, “And she’s published in the journal of the TEI,” and I didn’t even know that existed. Absolutely. You’re right. Kind of hidden.

Chambliss: The hidden nature is another reason I wanted to talk to you because you’ve spent a lot of time with this project. You spent a lot of time with what is essentially a tool. And I really admire the wording because you’ve done stuff with students and you were working at a small college, and I used to work at a similar institution. You were working explicitly with the financial records of Wheaton College and the reality was, when people talk about financial records in TEI, they are talking about your work. What was that like? Because it seems like it would be a really complicated exercise.

Tomasek: Well, 15 years ago, I was working with the archivist at the college where I work, and she had a chance to buy some journals that had some pocket journals. So, this is before we got to the financial records, but this is the background. This is how we got to TEI. She had a chance to buy these pocket diaries that had been kept by the woman who was married to a member of the family that formed the Wheaton Female Seminary in the 1830s, which is the organ that became the institution that became Wheaton College. And we had also just had an opportunity to, through some grants, ultimately from the Mellon foundation, to learn TEI and to figure out whether it would be useful to bring TEI into the classroom. And so, what we figured out [is that] the archivist was in fact asking students to do transcription and markup, which is what you do with TEI.

You transcribe the document and then you apply the guidelines to mark it up. That gave students an opportunity to practice close reading in the kind of way that history majors really need to learn how to do, but it’s hard to teach them how to do that kind of close reading so that they can do their own research. Right? I always joke that the pocket diaries were the easy stuff and we ran through them really quick because there weren’t very many of them. Then, these people just kept (the people who belong to the Wheaton family who founded this institution) a lot of their account books. In fact, Mrs. Wheaton kept every canceled check for the entire period she was financially independent. She was a widow for 40 years.

We have 40 years’ worth of canceled checks, which is more than I want to deal with and we’re not gonna go there. But she also kept really meticulous cash books that I hope we will get to eventually. What seemed like the simplest of the documents was this daybook, which is sort of part of the system of double entry accounting that Laban Maury Wheaton, who was the son of this family, ran in Norton, Massachusetts between 1828 and 1859. A daybook has dates and the names of people who came into the store, and a list of the stuff they bought and how much it costs. And if he extended them credit on the left-hand side of the page on that line where the transaction is, he made a note of the page in his ledger where he kept track of credit, where you can find more information about that person’s credit relationship with him.

I didn’t know enough. This is always the best part, right? This is how you get to learn things. I know enough to know that it was going to be weird and hard to use. A tree structure, which is what the TEI is to express this information that is actually in a tabular structure in these account books. So, I messed with it a little while myself. I sent out a bunch of messages to the TEI list and I got messages back, saying things like, “Yeah, I was thinking about that for a while,” and “Yeah, I could, it could work.”

At one point, I was talking to, what do we call them, a program director at the Office of Digital Humanities. And we were talking about an idea that I had had for a possible startup grant and she didn’t like that idea. And so, then I said, “Well, okay…” So, then, I’ve been working [and] thinking about this stuff. And she said, “Oh, that is so cool.” [Sometimes you never know] what the cool stuff is until you talk to somebody. We started getting grants to figure out how to do this stuff because, even though there were places in the guidelines that could hold the kind of information we wanted to be able to express from the accounting records, it turns out that there’s more to figuring out what your data model is [and] what it actually is you’re trying to express or that’s not right.

What the document expresses [is] what people were doing so that then you can express it in a digital way. And I’m really fortunate to have a colleague in Austria who was working on the same problem at the same time that I was, and we didn’t know each other existed until we started seeing each other’s work at the international digital humanities conferences. And this is someone who works on a different kind of thing that you need on the web to help the machines make the kinds of connections that can come up on the web; the things that come up in the box on the right-hand side of your browser when you Google something like “Walt Whitman,” right. This colleague of mine was working on what’s called an ontology, which is a kind of graph representation of how that information looks, the information that’s in an account book. . . . So, part of my story is that the reason I might seem boss to other people is because I get to talk to really cool people who are really smart. I’ve worked with a lot of smart people, sure.

Chambliss: Most people wouldn’t necessarily have thought of putting students in that space. One of the things is that, once you work with students around DH—even though we often talk about them being digital born—they’re actually quite, well, they have a lot of anxiety about digital tools. They are users, not makers.

Tomasek: Consumers, not producers.

Chambliss:  It seems like anything involved in TEI would be a huge challenge for a lot of students. But you worked with them semester after semester, right? Like, it wasn’t one semester and you were done. It was actually multiple years.

Tomasek: Yeah, and the best work we’ve ever been able to get done. So, what I’m able usually to do in the classroom is do some introducing and, hopefully, if I’m lucky, reduce the anxiety level about the angle bracket. I imagine trying to teach people something like Pearl or Python. I can’t do that myself. I’m lucky if I can do this, this thing that’s sort of pseudo programming called X query. But in the classroom, what’s worked really well in this introductory way is to do workshops where I am not the only instructor; it’s always a collaboration between me and the people in the library, who have expertise in technical tools and digital pedagogies and those kinds of things. The thing that some colleagues might get worried about is we spend classroom time on these workshops, having the students do the work of the transcription because, guess what, paleography [and] being able to read handwriting cursive and sloppy cursive handwriting from the 19th century is not an easy thing.

You have too have many eyes looking at this and saying is it short for [a] bushel. You know, [that] kind of a thing. But then students have also expressed how important it was for them, for the out-of-classroom work that they were doing, to know where the librarian’s office was so they could go get help outside of class. And I think that’s been really cool because even better than the workshop model has been the possibility of taking students who have had that little bit of an introduction during a semester-long course—to work together in groups over the summer, to do the transcription, to do the markup, and to go through the process.

They learn that collaboration is a really good way to work, that it’s okay not to be able to be perfect at every piece of a task. They get an opportunity to spend a long time with the source. We’ve had a couple of really great experiences. For instance, when the students were working on the pocket diaries, they got really involved. This was a really long time ago now, but they got really excited about the people they were finding names of and they started going off into the graveyards in Norton, Massachusetts, and finding birth and death dates and all that kind of stuff, which was very cool. In the summer of 2016, I had four students and they were working so well together that they completed a full transcription of this 200+ page daybook in seven weeks, which was amazing.

And there’s something else I was going to say about their work: how they developed. They see the text editor we use allows you different views. There’s a view that shows you all the angle brackets, but there’s also a view that’s much more clean-screen. And what I saw them doing—and I teach them to do this—but they developed their own sort of style of working their own workflow, I guess, is the way to say it. They would switch back and forth between the angle bracket view and the more clean-screen view. And what I thought was really great about it was that this was like, their math and they were very comfortable with all of this stuff by the time their seven weeks was over. And you know, when you think about the project you’re still in, you’re still sort of involved in it in different phases in different iterations. But one of the things that’s really interesting is, because you are at a small liberal arts college and you’re doing digital humanities work or you’ve been able to—

Chambliss: I think [you] touch on that [in] your previous answer, some of the ways your work has changed; the sort of flow of being a teacher, having students who are involved in the landscape. But then there’s also this question of like, what does it matter that you’re doing this right? I mean like, in some ways, you’re doing this project that’s about your school [but], when you’re writing it up, technically you’re writing up about TEI. You’re not necessarily writing history articles. This is an important distinction here.

For people who are listening, Digital Humanities means lots of different things to many people, but it’s heavily influenced by literary studies. So, that’s one really defined narrative within digital humanities. Ed Ayers talks about generative scholarship associated with doing Digital History. And this is a really complicated question because your work is very technical. It’s not as technical as, perhaps R, but if you know anything about R, then you might understand. It’s not Mallet, it’s not statistics, but it is a clear contribution as you outlined. When you write about it, you’re writing in a TEI journal, you’re not writing in the Journal of American History. You are not writing in American Historical Review (AHR), which really gets at this question that I know that you really have been very involved with, the question of the rise of digital in history and how do we recognize that? How do we note that? And again, your career gets at this question. You have spent a lot of time thinking about this. I’m sure there are people who would say she’s not doing history. She is not publishing in history journals. She’s not doing history and I’m not okay with this.

But you’ve also been very much involved with a conversation within the profession about, how we think about Digital Humanities. How do we think about the digital process and digital scholarship? I want you to talk a little bit about that because I think that’s a really important part of the landscape around digital. Can you talk a little bit about that?

Tomasek: Where do I start? Okay. One of the things you’re referring to is the fact that Ed Ayers invited me to be on the committee, this ad hoc committee that the American Historical Association put together. We worked in 2014 and 2015 to develop a set of guidelines. Gosh, I just seem to do guidelines all over the place, don’t I? Guidelines for the professional evaluation of digital scholarship by historians, such a 19th century kind of title. That was a huge thing because it was the first time we had ever had that kind of thing in the context of the historical profession. I have always felt like the Modern Language Association was 10 years ahead of us because they had developed [those] kinds of guidelines for literary scholars about a decade earlier. And one of the things we talked about in the committee was how our guidelines had to be slightly different.

They’re mostly aimed at helping people on tenure and promotion committees, including senior colleagues in history departments and administrators, deans, provosts—those kinds of people. Helping them understand where digital projects in history fit in the context of traditional ideas of what history is. For a long time, one of the recommendations about digital humanities was: don’t do it if before you have tenure. And I made a slight mistake. I thought having tenure would make it okay for me to go ahead and do a digital project and that slowed me down for promotion. On your own campus, one of the big challenges of doing digital work has to do with educating your senior colleagues and administrators. And that’s not nothing, I guess is what I would say about that.

And being able to point to the guidelines [now] is really useful. The American Historical Association also has a working group that continues to be a reference point for people who want to be able to say to their chair, for example, “This is how a digital project fits and should count.” We talked a lot about counting, right? So, I think I’m very proud of having had a chance to work on those guidelines and I’m really happy that they exist. I don’t think it matters where you are but having figured out how to talk granting to funders [and] to granting agencies helps with that kind of education and with being noticed and with helping senior colleagues and administrators understand that kind of thing.

I feel really weird talking about senior colleagues cause I’m old at this point. Now I’m senior so, you know, do with that what you will.

Chambliss: Well, having done all the work, what do you see as the future for TEI? Where it’s going?

Tomasek: It’s been around for 30 years. It’s going to be around for another 30. One of my colleagues on the board really wants us to figure out how TEI fits with the graph model of the web, which means integrating TEI with this thing called RDF, which stands for resource definition framework. You don’t have time to talk about that today, but I think the thing about the TEI is that it’s evolved. When the TEI started, there was no XNL. We had the precursor of both HTML and XML, which was this thing called SGML, which I can’t remember what that stands for. So, there were people from the TEI who were on the working group that came up with XML. So, I always like it when my colleagues say, “Oh yeah, and the TEI contributed to the technology we use today for storing the information in our word processing and XL files,” and all that kind of stuff, right? So, I think the TEI is gonna be around and I think it’s gonna continue to evolve as the technology evolves, and I think that’s really exciting. I think one thing that’s going on at this point is that we’re well into the third generation of technical experts and scholars who are involved with the TEI. And I think that’s really very exciting. I think that’s going to continue to be there. I and my friends make a lot of jokes about [me] being one of the very few historians who use the TEI because literature and linguistics are places where the TEI came from.

But documentary editing is [a] place within the field of history that is actually very close to the kind of scholarly editing. That is the place where the TEI came from. And that includes things like the papers of George Washington, Thomas Jefferson, and names [of] all the presidents. But it also includes the papers of Martin Luther King Jr., and documentary editing is actually much, much bigger… And I think that…as we continue to observe what digital technologies can help us do as we think historically—as we continue to think about what it means that, with a bookkeeping ontology, I can do a transcription. I mean, I can do a digital edition of this rural New England businessman’s books and it can talk to, it can be in relation to, the books from a store on a North Carolina plantation. It can be in conversation with the accounts kept by George Washington. It can be in conversation with accounts that were kept in 15th century Florence. I think there are ways that digital tools and the digital environment can continue to change the kinds of questions we ask and the things we do when we do history.

Chambliss: Well, I think that’s a great place to end it because that really points away to great things to come. I really appreciate you taking the time to talk to me about your work. And if people want to find you on the web, where would they go?

Tomasek: So, mostly I’m really active on Twitter, which is probably not a good thing to say. I’m @KathrynTomasek. I also actually respond to email.

Chambliss: Well, thank you, and thanks for taking the time to talk to me for Reframing History.





Share This Book