Although there are only 8 parts of this portfolio, they’re uploaded in reverse order and it may not be entirely clear how they go together, plus there are three bits … Continue reading Digital Humanities Portfolio Index
Tag: text analysis
Stirring Shakespeare’s Tragedies: A Text Analysis Project
I wanted to try a new way of looking at texts that I already knew and the Mandala Browser looked like it was an interesting way to “stir the archive” so that these texts would become “weird” and perhaps show me a new way to read them. Once I learned that the browser came with Shakespeare’s tragedies built into it, I began to think of things that I could look for, connections that I already knew existed but which I might be able to prove were a bigger deal or a more wide-ranging phenomenon rather than a thing that English professors just tell their students about so that they can write papers with tenuous connections to the text. Specifically, I was looking for a correlation between the way nature acted and the state’s dysfunction which appears throughout Shakespeare’s tragedies. This involved setting up magnets with groups of words like “storm gale wind tempest” and “anger ire insanity insane angry” to see what overlap there was, but that didn’t prove as useful as I wanted it to be.
I then looked to a different way of finding related words and remembered that topic modeling was an interesting option. It would give me a list of words that were related which I could then input into the Mandala browser to see what those connections would be. This proved to be a fruitful endeavor which separated out my bias and allowed the texts to show for themselves what they were about. Some of the groups of words that I used were more obvious than others, but all provided at least a few interesting speeches that I would not have connected without a lot of time spent trying to match things in my head. This was an effective way to stir the archive and see texts in a new, colorfully connected way.
Methods and Materials
The two products used for this project were Mandala Browser and TopicModelingTool, both of which are free and open source. The Mandala Browser came with a built-in document that has all the speeches from Shakespeare’s tragedies separated and indexed for ease of use. When you create a magnet in the Mandala Browser, every speech which contains that word is pulled from the edge of the screen to orbit the magnet. When you create another magnet with a different word (or words) the same happens and a mini-magnet appears in between them around which orbits the speeches which contain both words (or sets of words). This allows you to see how the two words are used together in the texts. You can create as many magnets as you want and the program will show you how they are all connected with mini-magnets, but anything over 4 magnets quickly became unruly to work with.
Once I realized that wasn’t really doing anything with my initial method, I looked for a quick and easy topic modeling tool and lo, the creatively named TopicModelingTool, found on Alan Liu’s DH Toychest, was exactly what I was looking for. I had to create a .txt document of all the tragedies and strip out excess information from the Gutenberg Project and other sources. Once I had a file, I put it through the TopicModelingTool on the default settings (200 passes through the text, 10 topics with 10 words per topic) and got some interesting results. I tried putting each word of the first topic into Mandala with a different magnet for every word. 10 magnets, though, is too much and the Mandala window became a mess of lines and circles and colors.
So I went back to TopicModelingTool and gave it different parameters (1000 passes through the text, 20 topics, 3 words per topic). This produced much more manageable results and when I put each topic into Mandala in the same way and got a nice, easy to read and work with result. Each one that I tried produced connections from various plays and expanded beyond what I had previously thought about Shakespeare’s plays when I conceived of them as individual works rather than parts of a body of work. What this project provided me was not a deeper understanding of Shakespeare’s methods or writing style but rather an alternative way of reading his plays. The Mandala Browser makes each speech a separate “work” which it then mixes and matches based on the user’s input. What is shows is not groundbreaking new ways to understand a text, it is a way to deform and distort the texts so that the user can read them with new eyes.
Results: Some Case Studies
Good Night Friends
When I first saw this topic appear in the TopicModelingTool it seemed like such an obvious trio, especially in that order. It is no wonder that the words “good” and “night” and “friends” would appear near each other in Shakespeare’s texts because they appear so frequently in my own life. But when I entered them into the Mandala Browser, I found some surprising connections between them, or lack thereof.
It turns out that while there are a good number of speeches where both “good” and “friends” appear (38 total) and even more where “good” and “night” share a space (89), only two speeches in the entirety of Shakespeare’s tragedies share all three words. The first is Hamlet’s in Scene 2.2:
“Very well. Follow that lord; and look you mock him not. My good friends, I’ll leave you till night: you are welcome to Elsinore.”
Here Hamlet dismisses his buddies Rosencrantz and Guildenstern after setting their plan to have a play out his evil new dad’s murderous ways. It is a somewhat standard farewell and only the “good” modifier of friends gives any indication and Rosencrantz and Guildenstern are more than just hangers on. The other speech comes from The Tragedy of Antony and Cleopatra as Antony asks his servants to tend to him one last time:
Tend me to-night; May be it is the period of your duty: Haply you shall not see me more; or if, A mangled shadow: perchance to-morrow You’ll serve another master. I look on you As one that takes his leave. Mine honest friends, I turn you not away; but, like a master Married to your good service, stay till death: Tend me to-night two hours, I ask no more, And the gods yield you for’t! (4.2.24-33)
The “good” in this speech is related not to the quality of the friendship but to the standard of service that Antony’s reliable house servants have provided. And “friend” is modified by “honest,” an entirely different although no less heartfelt descriptor of what a friend might be. Finally, “night” seems to appear thanks to the hyphenated version of “tonight,” but I do not see that as a mistake, rather it is an evocation of a time and a melancholy that haunts the entire scene. It is soon Antony’s end, and he has few to spend his short remaining time with than those whose job it is to serve him. He still has genuine affection for them or he would not call them “honest friends,” but they are no Rosencrantz and Guildenstern.
This is one of the interesting outcomes in a topic model. Even with a relatively small sample size there are still patterns to see. A brief glance at the speeches which held both “good” and “night” in their length showed a roughly equal number of examples which paired the two together in their standard farewell meanings and those which scattered them among many more words, though they were often used more than once in a given speech if they were not connected directly. It is this kind of nebulous connection made more concrete that topic modeling visualized through the Mandala browser can provide. A topic need not be entirely connected by each element equally and wholly, but strong connections between each element individually will make for a stronger whole. With this topic we can see Shakespeare construct night-time gatherings of friends or people brought together by a common cause across plays.
Life Nature Death
This is the most interesting topic produced by the TopicModelingTool because it shows more of a strong core connection between all three words (7 instances of all three words appearing in one speech), each of which is a huge topic in its own right in Shakespeare’s tragedies, and which also demonstrates a glitch in the system which may yet prove meaningful.
Since this is not a giant research paper, I’ll only examine two of the speeches which contain all three topic words. The first comes from Act 2 Scene 2 of Macbeth as the title character tells his wife of his completed assassination:
“Methought I heard a voice cry ‘Sleep no more! Macbeth does murder sleep’, the innocent sleep, Sleep that knits up the ravell’d sleeve of care, The death of each day’s life, sore labour’s bath, Balm of hurt minds, great nature‘s second course, Chief nourisher in life‘s feast,–” (2.2.46-51).
Although the scene is entirely about life and death, and the nature of murder, this particular speech is actually about the quality of sleep which Macbeth imagines he has murdered along with his friend and king in his quest for the throne. And yet, all three words appear in the last line of the speech, the point where he extols the virtues of sleep and laments the way he has killed it for the foreseeable future. The topic words combine in a way both expected, all together and in the aftermath of an assassination, and unexpected, in reference to sleep.
I mentioned above that this topic did encounter a glitch, and it is due to the name of one of Shakespeare’s lesser known plays, The Life of Timon of Athens. I have not read this play and I have no real context for it, but the title’s use of the word “life” means that of the 7 speeches where all three words overlap, 3 come from this play. In all three cases, the only occurrence of “life” is in the title of the play, which counts for Mandala but not the TopicModelingTool. This is an instructive glitch, because it highlights the issues that may occur when going from one tool to another. There is no way of telling Mandala to ignore the title of a play when it searches for these speeches containing a word, even though the other tool does not “see” the title of the play. Perhaps this is also a nudge towards the real use of topic models, which is as a loosely defined and even more loosely connected set of words which may have some deeper meaning to them. Like I said earlier, it is not necessary to only examine the speeches where all three topic words appear and in fact, the number of speeches containing two of the topic words (78 in total for this topic) are probably the more fruitful areas of interest for a more in depth research project.
There are two large takeaways from this project. The first is the efficacy and even necessity of using multiple tools in conjunction with each other. How one tool informs another is a relationship that cannot be fully understood until you just play around with them for a bit. Experimentation and serious playfulness will lead a researcher such as myself to connections that I might not have guessed at on my own and with a rudimentary understanding of how the tools work. It takes fiddling to fully grasp the potential of a tool, it takes breaking it by asking it to do something it cannot do and it takes asking it to do something strange that it ends up being great at to really discover the multitude of possibilities. And then it takes even more fiddling with the tools in relation to each other to discover how they might work together. Each tool is good for some things and not good for others. In this case, the TopicModelingTool is good at creating these topics but it is terrible at actually letting you read the texts or see how the topics are formed by their signifying words. That is where the Mandala browser enters the picture, as it both visualizes those connections and brings the researcher back to the original text. Each tool might serve its own small purpose in a research project, but it is only when they are used together that they become as powerful as they can be.
The other lesson learned is that it is ok and sometimes even necessary to throw out a research question if it is not working with the tools you are using in a data analysis project. I had this initial idea to look at the way nature interacts with the state of a character’s inner mind at the outset of this project. But that yielded no fruit. Instead, I found that the tools led the way, at least in this preliminary, exploratory setting. If I wanted to revisit that initial research question, I might try to find topics using the TopicModelingTool which coalesce around nature, and perhaps see what speeches contain those words and then investigate whether those speeches are in response to a change in a character’s being. I would have never known to do that, though, without this prework of discovering what the tools do separately and together, and how I might use their disparate abilities to answer that initial question. The scope of this project does not align with the scope of that question, but I am glad to have gotten the preliminary discovery work out of the way so that I might use these two tools in future projects, and so that I have a path to follow if I want to find out how other tools work.