Text Analysis in the Wild
This article has been making its way through the text-analysis episteme over the last few days. The article itself is unremarkable (it's just a rundown of President Bush's remarks on energy policy during Tuesday's State of the Union Address). But if you scroll down a bit, you'll see a clever set of balloons showing how frequently various keywords were mentioned. Enter a term in the search box, and you get a more elaborate version.
It's a lovely interface, and it's actually quite interesting to play with. I wonder, though, why this particular technique is so much more interesting with ordinary expository prose (and, in particular, political prose) than it is with, say, literary texts. I've done similar things with various novels and poems, and they always seem to me far less enlightening than one might expect. I will say, though, that the techique does become interesting when you begin to do this with all of, say, Jane Austen's novels. Then things begin to shake out.
Of course, we all know that the authors of political speeches (particularly speeches with as much moment as the State of the Union) pay very close attention to which keywords are used, how often, and when — almost performing a kind of reverse text analysis, which is in turn immediately analyzed with applause meters and so forth. This also seems to me to be one of the few contexts in which the colorized indication of "where" a keyword occurred becomes useful. Past attempts to do this with literary texts have produced nothing other than mildly interesting eye candy.
I hope the New York Times continues doing this sort of thing. In fact, I hope they eventually adopt nora and start making it a normal part of the online reading experience. We shall see.
The problem with text analysis on literary works is that you have to be a lot more precise in your searching. Brian's use of Token X on the Cather archive will be interesting because it lets you search for sets of words, and a Cather scholar will know how to craft sets that will make meaningful results.
Political speeches, on the other hand, are so much more direct (for the reasons you stated) that doing one word simple searches do turn up meaningful results- or at least, as you say, good eye candy.
Is Nora meant to help craft meaningful searches on literary works? I have to admit, I'm still a little fuzzy on what Nora does…
The funny (or tragic, depending on how you look at it) possible result of this kind of political analyzation will be people using it to "prove" something.
For more lite text analysis fun:
http://tagcrowd.com/