Tag: topic modeling

Stirring Shakespeare’s Tragedies: A Text Analysis Project

Introduction

Full Topic 1.png
A ten word topic model represented in its full chaos from the Mandala browser

I wanted to try a new way of looking at texts that I already knew and the Mandala Browser looked like it was an interesting way to “stir the archive” so that these texts would become “weird” and perhaps show me a new way to read them. Once I learned that the browser came with Shakespeare’s tragedies built into it, I began to think of things that I could look for, connections that I already knew existed but which I might be able to prove were a bigger deal or a more wide-ranging phenomenon rather than a thing that English professors just tell their students about so that they can write papers with tenuous connections to the text. Specifically, I was looking for a correlation between the way nature acted and the state’s dysfunction which appears throughout Shakespeare’s tragedies. This involved setting up magnets with groups of words like “storm gale wind tempest” and “anger ire insanity insane angry” to see what overlap there was, but that didn’t prove as useful as I wanted it to be.

I then looked to a different way of finding related words and remembered that topic modeling was an interesting option. It would give me a list of words that were related which I could then input into the Mandala browser to see what those connections would be. This proved to be a fruitful endeavor which separated out my bias and allowed the texts to show for themselves what they were about. Some of the groups of words that I used were more obvious than others, but all provided at least a few interesting speeches that I would not have connected without a lot of time spent trying to match things in my head. This was an effective way to stir the archive and see texts in a new, colorfully connected way.

Methods and Materials

The two products used for this project were Mandala Browser and TopicModelingTool, both of which are free and open source. The Mandala Browser came with a built-in document that has all the speeches from Shakespeare’s tragedies separated and indexed for ease of use. When you create a magnet in the Mandala Browser, every speech which contains that word is pulled from the edge of the screen to orbit the magnet. When you create another magnet with a different word (or words) the same happens and a mini-magnet appears in between them around which orbits the speeches which contain both words (or sets of words). This allows you to see how the two words are used together in the texts. You can create as many magnets as you want and the program will show you how they are all connected with mini-magnets, but anything over 4 magnets quickly became unruly to work with.

Once I realized that wasn’t really doing anything with my initial method, I looked for a quick and easy topic modeling tool and lo, the creatively named TopicModelingTool, found on Alan Liu’s DH Toychest, was exactly what I was looking for. I had to create a .txt document of all the tragedies and strip out excess information from the Gutenberg Project and other sources. Once I had a file, I put it through the TopicModelingTool on the default settings (200 passes through the text, 10 topics with 10 words per topic) and got some interesting results. I tried putting each word of the first topic into Mandala with a different magnet for every word. 10 magnets, though, is too much and the Mandala window became a mess of lines and circles and colors.

So I went back to TopicModelingTool and gave it different parameters (1000 passes through the text, 20 topics, 3 words per topic). This produced much more manageable results and when I put each topic into Mandala in the same way and got a nice, easy to read and work with result. Each one that I tried produced connections from various plays and expanded beyond what I had previously thought about Shakespeare’s plays when I conceived of them as individual works rather than parts of a body of work. What this project provided me was not a deeper understanding of Shakespeare’s methods or writing style but rather an alternative way of reading his plays. The Mandala Browser makes each speech a separate “work” which it then mixes and matches based on the user’s input. What is shows is not groundbreaking new ways to understand a text, it is a way to deform and distort the texts so that the user can read them with new eyes.

Results: Some Case Studies

Good Night Friends

When I first saw this topic appear in the TopicModelingTool it seemed like such an obvious trio, especially in that order. It is no wonder that the words “good” and “night” and “friends” would appear near each other in Shakespeare’s texts because they appear so frequently in my own life. But when I entered them into the Mandala Browser, I found some surprising connections between them, or lack thereof.Good Night Friends

It turns out that while there are a good number of speeches where both “good” and “friends” appear (38 total) and even more where “good” and “night” share a space (89), only two speeches in the entirety of Shakespeare’s tragedies share all three words. The first is Hamlet’s in Scene 2.2:

“Very well. Follow that lord; and look you mock him not. My good friends, I’ll leave you till night: you are welcome to Elsinore.”

Here Hamlet dismisses his buddies Rosencrantz and Guildenstern after setting their plan to have a play out his evil new dad’s murderous ways. It is a somewhat standard farewell and only the “good” modifier of friends gives any indication and Rosencrantz and Guildenstern are more than just hangers on. The other speech comes from The Tragedy of Antony and Cleopatra as Antony asks his servants to tend to him one last time:

 Tend me to-night; May be it is the period of your duty: Haply you shall not see me more; or if, A mangled shadow: perchance to-morrow You’ll serve another master. I look on you As one that takes his leave. Mine honest friends, I turn you not away; but, like a master Married to your good service, stay till death: Tend me to-night two hours, I ask no more, And the gods yield you for’t! (4.2.24-33)

The “good” in this speech is related not to the quality of the friendship but to the standard of service that Antony’s reliable house servants have provided. And “friend” is modified by “honest,” an entirely different although no less heartfelt descriptor of what a friend might be. Finally, “night” seems to appear thanks to the hyphenated version of “tonight,” but I do not see that as a mistake, rather it is an evocation of a time and a melancholy that haunts the entire scene. It is soon Antony’s end, and he has few to spend his short remaining time with than those whose job it is to serve him. He still has genuine affection for them or he would not call them “honest friends,” but they are no Rosencrantz and Guildenstern.

This is one of the interesting outcomes in a topic model. Even with a relatively small sample size there are still patterns to see. A brief glance at the speeches which held both “good” and “night” in their length showed a roughly equal number of examples which paired the two together in their standard farewell meanings and those which scattered them among many more words, though they were often used more than once in a given speech if they were not connected directly. It is this kind of nebulous connection made more concrete that topic modeling visualized through the Mandala browser can provide. A topic need not be entirely connected by each element equally and wholly, but strong connections between each element individually will make for a stronger whole. With this topic we can see Shakespeare construct night-time gatherings of friends or people brought together by a common cause across plays.

Life Nature Death

This is the most interesting topic produced by the TopicModelingTool because it shows more of a strong core connection between all three words (7 instances of all three words appearing in one speech), each of which is a huge topic in its own right in Shakespeare’s tragedies, and which also demonstrates a glitch in the system which may yet prove meaningful.

Life Nature Death

Since this is not a giant research paper, I’ll only examine two of the speeches which contain all three topic words. The first comes from Act 2 Scene 2 of Macbeth as the title character tells his wife of his completed assassination:

“Methought I heard a voice cry ‘Sleep no more! Macbeth does murder sleep’, the innocent sleep, Sleep that knits up the ravell’d sleeve of care, The death of each day’s life, sore labour’s bath, Balm of hurt minds, great nature‘s second course, Chief nourisher in life‘s feast,–” (2.2.46-51).

Although the scene is entirely about life and death, and the nature of murder, this particular speech is actually about the quality of sleep which Macbeth imagines he has murdered along with his friend and king in his quest for the throne. And yet, all three words appear in the last line of the speech, the point where he extols the virtues of sleep and laments the way he has killed it for the foreseeable future. The topic words combine in a way both expected, all together and in the aftermath of an assassination, and unexpected, in reference to sleep.

I mentioned above that this topic did encounter a glitch, and it is due to the name of one of Shakespeare’s lesser known plays, The Life of Timon of Athens. I have not read this play and I have no real context for it, but the title’s use of the word “life” means that of the 7 speeches where all three words overlap, 3 come from this play. In all three cases, the only occurrence of “life” is in the title of the play, which counts for Mandala but not the TopicModelingTool. This is an instructive glitch, because it highlights the issues that may occur when going from one tool to another. There is no way of telling Mandala to ignore the title of a play when it searches for these speeches containing a word, even though the other tool does not “see” the title of the play. Perhaps this is also a nudge towards the real use of topic models, which is as a loosely defined and even more loosely connected set of words which may have some deeper meaning to them. Like I said earlier, it is not necessary to only examine the speeches where all three topic words appear and in fact, the number of speeches containing two of the topic words (78 in total for this topic) are probably the more fruitful areas of interest for a more in depth research project.

Discussion

There are two large takeaways from this project. The first is the efficacy and even necessity of using multiple tools in conjunction with each other. How one tool informs another is a relationship that cannot be fully understood until you just play around with them for a bit. Experimentation and serious playfulness will lead a researcher such as myself to connections that I might not have guessed at on my own and with a rudimentary understanding of how the tools work. It takes fiddling to fully grasp the potential of a tool, it takes breaking it by asking it to do something it cannot do and it takes asking it to do something strange that it ends up being great at to really discover the multitude of possibilities. And then it takes even more fiddling with the tools in relation to each other to discover how they might work together. Each tool is good for some things and not good for others. In this case, the TopicModelingTool is good at creating these topics but it is terrible at actually letting you read the texts or see how the topics are formed by their signifying words. That is where the Mandala browser enters the picture, as it both visualizes those connections and brings the researcher back to the original text. Each tool might serve its own small purpose in a research project, but it is only when they are used together that they become as powerful as they can be.

 

The other lesson learned is that it is ok and sometimes even necessary to throw out a research question if it is not working with the tools you are using in a data analysis project. I had this initial idea to look at the way nature interacts with the state of a character’s inner mind at the outset of this project. But that yielded no fruit. Instead, I found that the tools led the way, at least in this preliminary, exploratory setting. If I wanted to revisit that initial research question, I might try to find topics using the TopicModelingTool which coalesce around nature, and perhaps see what speeches contain those words and then investigate whether those speeches are in response to a change in a character’s being. I would have never known to do that, though, without this prework of discovering what the tools do separately and together, and how I might use their disparate abilities to answer that initial question. The scope of this project does not align with the scope of that question, but I am glad to have gotten the preliminary discovery work out of the way so that I might use these two tools in future projects, and so that I have a path to follow if I want to find out how other tools work.

A Computer and a Data Set of One’s Own: My DH Future

matrixWhen I wrote the first paper for my Digital Humanities class, I cheekily didn’t include the last bit of what my professors asked for because I didn’t really know what to write. They were looking for how I thought I might use DH ideas or methods in my own work now that I had learned what those were in a general sense. But I still didn’t really know what the field was or how it all came together, despite my four pages saying that I did. Now, at the end of the semester, I feel like I have a tighter grasp on the kind of work that digital humanists do and I can finally answer that question, so here’s the answer.


As for my own entry into the Digital Humanities, I’m not sure exactly what it will look like. Part of this comes from the fact that I have not yet settled on a specialty or field of my own, and so I cannot say with certainty what kinds of projects I would be interested in doing nor can I theorize what they might look like specifically because I do not know what the data set would be. But the great thing about Digital Humanities is that it is a flexible field. One needs only a computer and a data set of one’s own to do the majority of DH work. One of my projects in this class involved combining two data tools to find and illuminate thematic connections in Shakespeare’s tragedies that might not be readily apparent. Both programs ran in java and were relatively simple to understand, even if intuiting what purpose they served was not quite as obvious as their explicit functionality. That project taught me that tools do no a DH project make, at least not on their own. It took my own synthesis to bring out the best in either tool and to truly mine the data set for what it was worth. But it was also not beyond my means nor a reasonable expectation of time dedicated to figuring out what I was doing. The tinker’s mentality must be strong in a Digital Humanist, and I feel like I have that, so it will not be a barrier to entry for me.

Another project, Harlem Echoes, showcased an even more valuable mentality to have if you want to be a DH scholar: teamwork. There the entire class worked together to first create a properly formatted and error-proof version of Claude McKay’s Harlem Shadows, a poetry collection from the Harlem Renaissance. After we had that framework, which was not easy, we came up with ideas for essays which would illuminate different aspects of the poetry and the poet’s life as well as develop some simple tools like a word cloud pulled from the thematic tags we each assigned to the poems we corrected. This was a complicated and drawn out process, but because we were working together towards a common goal it felt like less work and it really fostered a sense of community between us. We had some outside help for especially difficult WordPress things, so we even got to have some appreciation for the way that DH scholars often rely upon the expertise of others when it comes to more technical parts of the work. English students, especially grad students, often feel like lone wolves, out only for themselves and in search of singular achievements, but it was the collaboration that really formed the core of that experience and it is certainly a mindset that I would like to maintain throughout my work.

The final project we did were individual works of e-lit. I made mine using the Twine tool, and through a great deal of trial and error, I finally made something that I could be proud of. During the semester we had Harvard scholar and really awesome guy Vincent Brown visit our class to talk with us about his DH project, an interactive timeline and map of the slave revolt in Jamaica between 1760 and 1761. It is an amazing project and Brown spoke about how he wanted to use DH tools to tell the story of this revolt, a story mostly hidden in diaries and letters. He talked of the decisions he made in order to tell that story, and how they differed from his more traditional telling of the story in his upcoming book. This storytelling mentality really lit a spark in my brain and, when I do create DH projects in the future, it is that perspective that I will likely take. It meshes nicely with the Twine project that I worked on because doing something like that focuses the creator’s attention on the decision making process and the effect each decision will have on the reader’s experience in a way that traditional story writing had not done for me in the past. I quickly realized, for example, that I could not just throw my old work into this new medium and expect to get the same results. I instead had to re-write the entire thing and change what I was doing from the ground up in order to craft the experience I wanted the reader to have. The user experience is paramount in the way one presents DH work, and the creator must take a long look at everything they do in order to make sure that what they are saying is what they want to say. Johanna Drucker reminds us always that every choice means something, that people are not just dots on a map and that the world does not abide by our lines and separations. The conscious decision making process is one central to DH, especially when it comes to visualizing any data connections that I might find.

The last element of Digital Humanities work is the part that is both the most promising and the most likely to keep me from fully embracing it. The unfortunate truth is that nobody quite knows what to make of DH projects yet. Heck, even Vincent Brown asked us what DH is and if his project really fit into its parameters. Because DH is more rightly seen as a set of guiding principles rather than a set-in-stone philosophy or methodology, it is more open to experimentation and new ideas. That is the positive side, the thing that gets Digital Humanists excited to forge new paths and discover new ways of seeing. But the negative side is that the resulting projects then enter a nebulous world where they are often seen as holding less value than their more traditional scholarship counterparts hold when it comes to, say, evaluating whether or not a professor deserves tenure or even at the hiring stage before that. At Lehigh we are still working out how a DH project might count towards a dissertation or as part of a Master’s thesis, and what our school ends up deciding will be different from what every other school decides, so it may be that none of my work, if I were to include a DH section in either work, would mean much of anything to anybody else. That makes it very difficult to dedicate a lot of time and effort in a project that might ultimately be of little value. I suppose that the personal growth from doing such a project would be a benefit, but in the current academic world anything that does not improve your chances at achieving the next step might as well knock you back a few.

The reasons to engage in the Digital Humanities are numerous and varied. It asks you to think in ways that you are not trained to think, and that is always a good thing. It encourages you to work with people outside your small departmental bubble, which can only expand your field of vision beyond what it would normally be. And it encourages you to be thoughtful about your decisions, about the way that your present your findings, and about the effect your words or pictures or whatever have on your audience, these are never bad things to consider. But the perilous reception might cause me to keep away, at least at first, until I build some kind of reputation for myself. I will never rule out DH projects because they can be more exciting than traditional scholarship, but I will be sure to weigh the pros and cons in each particular situation so that I know what I might be giving up and what I might be adding to my work.